Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kops on a disconnected environment #16453

Closed
dormullor opened this issue Apr 5, 2024 · 9 comments
Closed

Kops on a disconnected environment #16453

dormullor opened this issue Apr 5, 2024 · 9 comments
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.

Comments

@dormullor
Copy link

dormullor commented Apr 5, 2024

/kind bug

1. What kops version are you running? The command kops version, will display
this information.

1.26.3

2. What Kubernetes version are you running? kubectl version will print the
version if a cluster is running or provide the Kubernetes version specified as
a kops flag.

1.26.4

3. What cloud provider are you using?
AWS

4. What commands did you run? What is the simplest way to reproduce this issue?
Manage your own security group and allow egress traffic only for internal communication ( block 0.0.0.0/0 and allow vpc cidr)

 kops update cluster **** --yes --lifecycle-overrides SecurityGroup=Ignore,SecurityGroupRule=Ignore

5. What happened after the commands executed?
exceed timeout

6. What did you expect to happen?
When ssh into the master node, the nodeup process exit's with the following error :

Apr  5 08:03:24 ip-172-20-10-182 nodeup[1035]: I0405 08:03:24.305209    1035 s3context.go:192] unable to get bucket location from region "us-east-1"; scanning all regions: RequestError: send request failed
Apr  5 08:03:24 ip-172-20-10-182 nodeup[1035]: caused by: Get "https://s3.dualstack.us-east-1.amazonaws.com/r*****?location=": dial tcp 52.217.230.168:443: i/o timeout
Apr  5 08:03:24 ip-172-20-10-182 nodeup[1035]: I0405 08:03:24.374846    1035 s3context.go:298] Querying S3 for bucket location for ****
Apr  5 08:03:24 ip-172-20-10-182 nodeup[1035]: I0405 08:03:24.374904    1035 s3context.go:303] Doing GetBucketLocation in "eu-west-3"
Apr  5 08:03:24 ip-172-20-10-182 nodeup[1035]: I0405 08:03:24.374911    1035 s3context.go:303] Doing GetBucketLocation in "us-west-2"
Apr  5 08:03:24 ip-172-20-10-182 nodeup[1035]: I0405 08:03:24.374930    1035 s3context.go:303] Doing GetBucketLocation in "eu-west-1"
Apr  5 08:03:24 ip-172-20-10-182 nodeup[1035]: I0405 08:03:24.375066    1035 s3context.go:303] Doing GetBucketLocation in "ca-central-1"
Apr  5 08:03:24 ip-172-20-10-182 nodeup[1035]: I0405 08:03:24.378346    1035 s3context.go:303] Doing GetBucketLocation in "ap-northeast-3"
Apr  5 08:03:24 ip-172-20-10-182 nodeup[1035]: I0405 08:03:24.378520    1035 s3context.go:303] Doing GetBucketLocation in "us-east-2"
Apr  5 08:03:24 ip-172-20-10-182 nodeup[1035]: I0405 08:03:24.378718    1035 s3context.go:303] Doing GetBucketLocation in "eu-south-1"
Apr  5 08:03:24 ip-172-20-10-182 nodeup[1035]: I0405 08:03:24.378767    1035 s3context.go:303] Doing GetBucketLocation in "us-west-1"
Apr  5 08:03:24 ip-172-20-10-182 nodeup[1035]: I0405 08:03:24.378885    1035 s3context.go:303] Doing GetBucketLocation in "eu-central-1"
Apr  5 08:03:24 ip-172-20-10-182 nodeup[1035]: I0405 08:03:24.378406    1035 s3context.go:303] Doing GetBucketLocation in "ap-south-1"
Apr  5 08:03:24 ip-172-20-10-182 nodeup[1035]: I0405 08:03:24.378418    1035 s3context.go:303] Doing GetBucketLocation in "eu-north-1"
Apr  5 08:03:24 ip-172-20-10-182 nodeup[1035]: I0405 08:03:24.378439    1035 s3context.go:303] Doing GetBucketLocation in "ap-northeast-2"
Apr  5 08:03:24 ip-172-20-10-182 nodeup[1035]: I0405 08:03:24.378454    1035 s3context.go:303] Doing GetBucketLocation in "ap-northeast-1"
Apr  5 08:03:24 ip-172-20-10-182 nodeup[1035]: I0405 08:03:24.378472    1035 s3context.go:303] Doing GetBucketLocation in "us-east-1"
Apr  5 08:03:24 ip-172-20-10-182 nodeup[1035]: I0405 08:03:24.378481    1035 s3context.go:303] Doing GetBucketLocation in "sa-east-1"
Apr  5 08:03:24 ip-172-20-10-182 nodeup[1035]: I0405 08:03:24.378490    1035 s3context.go:303] Doing GetBucketLocation in "ap-southeast-1"
Apr  5 08:03:24 ip-172-20-10-182 nodeup[1035]: I0405 08:03:24.378498    1035 s3context.go:303] Doing GetBucketLocation in "ap-southeast-2"
Apr  5 08:03:24 ip-172-20-10-182 nodeup[1035]: I0405 08:03:24.379255    1035 s3context.go:303] Doing GetBucketLocation in "eu-west-2"
Apr  5 08:03:29 ip-172-20-10-182 nodeup[1035]: W0405 08:03:29.375004    1035 main.go:133] got error running nodeup (will retry in 30s): error loading Cluster "s3://****/******/cluster-completed.spec": Could not retrieve location for AWS bucket *****

7. Please provide your cluster manifest. Execute
kops get --name my.example.com -o yaml to display your cluster manifest.
You may want to remove your cluster name and other sensitive information.

apiVersion: kops.k8s.io/v1alpha2
kind: Cluster
metadata:
  creationTimestamp: "2024-04-05T07:23:05Z"
  name: ********
spec:
  additionalPolicies: {}
  api:
    loadBalancer:
      class: Classic
      securityGroupOverride: sg-*****
      type: Public
  assets:
    containerRegistry: *******.dkr.ecr.us-east-1.amazonaws.com/kops
    fileRepository: https://s3.us-east-1.amazonaws.com/******
  authorization:
    rbac: {}
  cloudProvider: aws
  configBase: s3://*****/******
  containerd:
    configOverride: |2
            version = 2
            [plugins]
              [plugins."io.containerd.grpc.v1.cri"]
                sandbox_image = "*****.dkr.ecr.us-east-1.amazonaws.com/kops/pause:3.9@sha256:7031c1b283388d2c2e09b57badb803c05ebed362dc88d84b480cc47f72a21097"
              [plugins."io.containerd.grpc.v1.cri".registry.mirrors."*******.dkr.ecr.us-east-1.amazonaws.com"]
                endpoint = ["https://******.dkr.ecr.us-east-1.amazonaws.com"]
                [plugins."io.containerd.grpc.v1.cri".registry.configs."******.dkr.ecr.us-east-1.amazonaws.com".auth]
                  username = "AWS"
                  password = "******"
                [plugins."io.containerd.grpc.v1.cri".containerd]
                  [plugins."io.containerd.grpc.v1.cri".containerd.runtimes]
                    [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc]
                      runtime_type = "io.containerd.runc.v2"
                      [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
                        SystemdCgroup = true
  dnsZone: *****
  etcdClusters:
  - etcdMembers:
    - instanceGroup: master-1
      name: master-1
    name: main
  iam:
    allowContainerRegistry: true
    legacy: false
  kubeProxy:
    enabled: true
  kubelet:
    anonymousAuth: false
  kubernetesVersion: 1.26.4
  masterPublicName: api.*****
  networkCIDR: 172.20.0.0/16
  networkID: vpc-*****
  networking:
    calico: {}
  nodeTerminationHandler:
    enableSpotInterruptionDraining: false
    enabled: false
  nonMasqueradeCIDR: 100.64.0.0/10
  sshKeyName: *****
  subnets:
  - cidr: 172.20.10.0/24
    id: subnet-*****
    name: us-east-1b
    type: Public
    zone: us-east-1b
  topology:
    dns:
      type: Public
    masters: public
    nodes: public

---

apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
  creationTimestamp: "2024-04-05T07:23:08Z"
  labels:
    kops.k8s.io/cluster: *****
  name: master-1
spec:
  image: 099720109477/ubuntu/images/hvm-ssd/ubuntu-focal-20.04-amd64-server-20240126
  kubelet:
    anonymousAuth: false
    nodeLabels:
      kops.k8s.io/kops-controller-pki: ""
      node-role.kubernetes.io/control-plane: ""
      node.kubernetes.io/exclude-from-external-load-balancers: ""
    taints:
    - node-role.kubernetes.io/control-plane=:NoSchedule
  machineType: m5.xlarge
  manager: CloudGroup
  maxSize: 1
  minSize: 1
  role: Master
  securityGroupOverride: ******
  subnets:
  - us-east-1b

---

apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
  creationTimestamp: "2024-04-05T07:23:08Z"
  labels:
    kops.k8s.io/cluster: *****
  name: node
spec:
  image: 099720109477/ubuntu/images/hvm-ssd/ubuntu-focal-20.04-amd64-server-20240126
  kubelet:
    anonymousAuth: false
    nodeLabels:
      node-role.kubernetes.io/node: ""
  machineType: c6i.2xlarge
  manager: CloudGroup
  maxSize: 2
  minSize: 2
  nodeLabels:
    nvidia.com/gpu.deploy.dcgm-exporter: "true"
    nvidia.com/gpu.deploy.device-plugin: "true"
  packages:
  - nfs-common
  role: Node
  securityGroupOverride: sg-*****
  subnets:
  - us-east-1b

I have created a VPC endpoint for S3 with an Interface type, but all of the DNS records do not include the dualstack.

*.vpce-*****.s3.us-east-1.vpce.amazonaws.com
*.vpce-*****-us-east-1b.s3.us-east-1.vpce.amazonaws.com
s3.us-east-1.amazonaws.com
*.s3.us-east-1.amazonaws.com
*.s3-accesspoint.us-east-1.amazonaws.com
*.s3-control.us-east-1.amazonaws.com
@k8s-ci-robot k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Apr 5, 2024
@zetaab
Copy link
Member

zetaab commented Apr 6, 2024

Its not clear for me how this is kops bug?

@dormullor
Copy link
Author

There is no way to setup kops for disconnected env... i can open a feature request if you want to

@zetaab
Copy link
Member

zetaab commented Apr 7, 2024

there is way to install kops in disconnected environment. However, you must copy all assets first. It can be installed without any internet connectivity, you just need to have connectivity to single object storage.

https://kops.sigs.k8s.io/operations/asset-repository/

also you need to use kops channel: none (I cannot see this in your spec at all.. so its not none in that case. Default value is stable)

@zetaab
Copy link
Member

zetaab commented Apr 7, 2024

@dormullor
Copy link
Author

dormullor commented Apr 12, 2024

@zetaab Although I have added all assets files and containers into s3 and ECR and configured kops to use it, when looking at the nodeup logs I can see an error when trying to retrieve the s3 cluster-completed.spec even if I configure a s3 vpc endpoint.

That's because kops using the s3://bucket-name schema and the s3 vpc endpoint use the full s3 DNS name (bucket-name.s3.us-east-1.amazonaws.com).

As a result, kops cannot be used in a disconnected environment on AWS

W0412 06:49:07.558115    1040 main.go:133] got error running nodeup (will retry in 30s): error loading Cluster "s3://kops-state-****/*****/cluster-completed.spec": file does not exist

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 30, 2024
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle rotten
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Aug 29, 2024
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

@k8s-ci-robot k8s-ci-robot closed this as not planned Won't fix, can't repro, duplicate, stale Sep 28, 2024
@k8s-ci-robot
Copy link
Contributor

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
Development

No branches or pull requests

4 participants