Kops on a disconnected environment #16453

dormullor · 2024-04-05T09:13:36Z

/kind bug

1. What kops version are you running? The command kops version, will display
this information.
1.26.3

2. What Kubernetes version are you running? kubectl version will print the
version if a cluster is running or provide the Kubernetes version specified as
a kops flag.
1.26.4

3. What cloud provider are you using?
AWS

4. What commands did you run? What is the simplest way to reproduce this issue?
Manage your own security group and allow egress traffic only for internal communication ( block 0.0.0.0/0 and allow vpc cidr)

 kops update cluster **** --yes --lifecycle-overrides SecurityGroup=Ignore,SecurityGroupRule=Ignore

5. What happened after the commands executed?
exceed timeout

6. What did you expect to happen?
When ssh into the master node, the nodeup process exit's with the following error :

Apr  5 08:03:24 ip-172-20-10-182 nodeup[1035]: I0405 08:03:24.305209    1035 s3context.go:192] unable to get bucket location from region "us-east-1"; scanning all regions: RequestError: send request failed
Apr  5 08:03:24 ip-172-20-10-182 nodeup[1035]: caused by: Get "https://s3.dualstack.us-east-1.amazonaws.com/r*****?location=": dial tcp 52.217.230.168:443: i/o timeout
Apr  5 08:03:24 ip-172-20-10-182 nodeup[1035]: I0405 08:03:24.374846    1035 s3context.go:298] Querying S3 for bucket location for ****
Apr  5 08:03:24 ip-172-20-10-182 nodeup[1035]: I0405 08:03:24.374904    1035 s3context.go:303] Doing GetBucketLocation in "eu-west-3"
Apr  5 08:03:24 ip-172-20-10-182 nodeup[1035]: I0405 08:03:24.374911    1035 s3context.go:303] Doing GetBucketLocation in "us-west-2"
Apr  5 08:03:24 ip-172-20-10-182 nodeup[1035]: I0405 08:03:24.374930    1035 s3context.go:303] Doing GetBucketLocation in "eu-west-1"
Apr  5 08:03:24 ip-172-20-10-182 nodeup[1035]: I0405 08:03:24.375066    1035 s3context.go:303] Doing GetBucketLocation in "ca-central-1"
Apr  5 08:03:24 ip-172-20-10-182 nodeup[1035]: I0405 08:03:24.378346    1035 s3context.go:303] Doing GetBucketLocation in "ap-northeast-3"
Apr  5 08:03:24 ip-172-20-10-182 nodeup[1035]: I0405 08:03:24.378520    1035 s3context.go:303] Doing GetBucketLocation in "us-east-2"
Apr  5 08:03:24 ip-172-20-10-182 nodeup[1035]: I0405 08:03:24.378718    1035 s3context.go:303] Doing GetBucketLocation in "eu-south-1"
Apr  5 08:03:24 ip-172-20-10-182 nodeup[1035]: I0405 08:03:24.378767    1035 s3context.go:303] Doing GetBucketLocation in "us-west-1"
Apr  5 08:03:24 ip-172-20-10-182 nodeup[1035]: I0405 08:03:24.378885    1035 s3context.go:303] Doing GetBucketLocation in "eu-central-1"
Apr  5 08:03:24 ip-172-20-10-182 nodeup[1035]: I0405 08:03:24.378406    1035 s3context.go:303] Doing GetBucketLocation in "ap-south-1"
Apr  5 08:03:24 ip-172-20-10-182 nodeup[1035]: I0405 08:03:24.378418    1035 s3context.go:303] Doing GetBucketLocation in "eu-north-1"
Apr  5 08:03:24 ip-172-20-10-182 nodeup[1035]: I0405 08:03:24.378439    1035 s3context.go:303] Doing GetBucketLocation in "ap-northeast-2"
Apr  5 08:03:24 ip-172-20-10-182 nodeup[1035]: I0405 08:03:24.378454    1035 s3context.go:303] Doing GetBucketLocation in "ap-northeast-1"
Apr  5 08:03:24 ip-172-20-10-182 nodeup[1035]: I0405 08:03:24.378472    1035 s3context.go:303] Doing GetBucketLocation in "us-east-1"
Apr  5 08:03:24 ip-172-20-10-182 nodeup[1035]: I0405 08:03:24.378481    1035 s3context.go:303] Doing GetBucketLocation in "sa-east-1"
Apr  5 08:03:24 ip-172-20-10-182 nodeup[1035]: I0405 08:03:24.378490    1035 s3context.go:303] Doing GetBucketLocation in "ap-southeast-1"
Apr  5 08:03:24 ip-172-20-10-182 nodeup[1035]: I0405 08:03:24.378498    1035 s3context.go:303] Doing GetBucketLocation in "ap-southeast-2"
Apr  5 08:03:24 ip-172-20-10-182 nodeup[1035]: I0405 08:03:24.379255    1035 s3context.go:303] Doing GetBucketLocation in "eu-west-2"
Apr  5 08:03:29 ip-172-20-10-182 nodeup[1035]: W0405 08:03:29.375004    1035 main.go:133] got error running nodeup (will retry in 30s): error loading Cluster "s3://****/******/cluster-completed.spec": Could not retrieve location for AWS bucket *****

7. Please provide your cluster manifest. Execute
kops get --name my.example.com -o yaml to display your cluster manifest.
You may want to remove your cluster name and other sensitive information.

apiVersion: kops.k8s.io/v1alpha2
kind: Cluster
metadata:
  creationTimestamp: "2024-04-05T07:23:05Z"
  name: ********
spec:
  additionalPolicies: {}
  api:
    loadBalancer:
      class: Classic
      securityGroupOverride: sg-*****
      type: Public
  assets:
    containerRegistry: *******.dkr.ecr.us-east-1.amazonaws.com/kops
    fileRepository: https://s3.us-east-1.amazonaws.com/******
  authorization:
    rbac: {}
  cloudProvider: aws
  configBase: s3://*****/******
  containerd:
    configOverride: |2
            version = 2
            [plugins]
              [plugins."io.containerd.grpc.v1.cri"]
                sandbox_image = "*****.dkr.ecr.us-east-1.amazonaws.com/kops/pause:3.9@sha256:7031c1b283388d2c2e09b57badb803c05ebed362dc88d84b480cc47f72a21097"
              [plugins."io.containerd.grpc.v1.cri".registry.mirrors."*******.dkr.ecr.us-east-1.amazonaws.com"]
                endpoint = ["https://******.dkr.ecr.us-east-1.amazonaws.com"]
                [plugins."io.containerd.grpc.v1.cri".registry.configs."******.dkr.ecr.us-east-1.amazonaws.com".auth]
                  username = "AWS"
                  password = "******"
                [plugins."io.containerd.grpc.v1.cri".containerd]
                  [plugins."io.containerd.grpc.v1.cri".containerd.runtimes]
                    [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc]
                      runtime_type = "io.containerd.runc.v2"
                      [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
                        SystemdCgroup = true
  dnsZone: *****
  etcdClusters:
  - etcdMembers:
    - instanceGroup: master-1
      name: master-1
    name: main
  iam:
    allowContainerRegistry: true
    legacy: false
  kubeProxy:
    enabled: true
  kubelet:
    anonymousAuth: false
  kubernetesVersion: 1.26.4
  masterPublicName: api.*****
  networkCIDR: 172.20.0.0/16
  networkID: vpc-*****
  networking:
    calico: {}
  nodeTerminationHandler:
    enableSpotInterruptionDraining: false
    enabled: false
  nonMasqueradeCIDR: 100.64.0.0/10
  sshKeyName: *****
  subnets:
  - cidr: 172.20.10.0/24
    id: subnet-*****
    name: us-east-1b
    type: Public
    zone: us-east-1b
  topology:
    dns:
      type: Public
    masters: public
    nodes: public

---

apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
  creationTimestamp: "2024-04-05T07:23:08Z"
  labels:
    kops.k8s.io/cluster: *****
  name: master-1
spec:
  image: 099720109477/ubuntu/images/hvm-ssd/ubuntu-focal-20.04-amd64-server-20240126
  kubelet:
    anonymousAuth: false
    nodeLabels:
      kops.k8s.io/kops-controller-pki: ""
      node-role.kubernetes.io/control-plane: ""
      node.kubernetes.io/exclude-from-external-load-balancers: ""
    taints:
    - node-role.kubernetes.io/control-plane=:NoSchedule
  machineType: m5.xlarge
  manager: CloudGroup
  maxSize: 1
  minSize: 1
  role: Master
  securityGroupOverride: ******
  subnets:
  - us-east-1b

---

apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
  creationTimestamp: "2024-04-05T07:23:08Z"
  labels:
    kops.k8s.io/cluster: *****
  name: node
spec:
  image: 099720109477/ubuntu/images/hvm-ssd/ubuntu-focal-20.04-amd64-server-20240126
  kubelet:
    anonymousAuth: false
    nodeLabels:
      node-role.kubernetes.io/node: ""
  machineType: c6i.2xlarge
  manager: CloudGroup
  maxSize: 2
  minSize: 2
  nodeLabels:
    nvidia.com/gpu.deploy.dcgm-exporter: "true"
    nvidia.com/gpu.deploy.device-plugin: "true"
  packages:
  - nfs-common
  role: Node
  securityGroupOverride: sg-*****
  subnets:
  - us-east-1b

I have created a VPC endpoint for S3 with an Interface type, but all of the DNS records do not include the dualstack.

*.vpce-*****.s3.us-east-1.vpce.amazonaws.com
*.vpce-*****-us-east-1b.s3.us-east-1.vpce.amazonaws.com
s3.us-east-1.amazonaws.com
*.s3.us-east-1.amazonaws.com
*.s3-accesspoint.us-east-1.amazonaws.com
*.s3-control.us-east-1.amazonaws.com

The text was updated successfully, but these errors were encountered:

zetaab · 2024-04-06T13:23:44Z

Its not clear for me how this is kops bug?

dormullor · 2024-04-07T07:05:45Z

There is no way to setup kops for disconnected env... i can open a feature request if you want to

zetaab · 2024-04-07T08:39:47Z

there is way to install kops in disconnected environment. However, you must copy all assets first. It can be installed without any internet connectivity, you just need to have connectivity to single object storage.

https://kops.sigs.k8s.io/operations/asset-repository/

also you need to use kops channel: none (I cannot see this in your spec at all.. so its not none in that case. Default value is stable)

zetaab · 2024-04-07T08:50:25Z

dualstack addresses are coming https://github.com/kubernetes/kops/blob/release-1.26/util/pkg/vfs/s3fs.go#L511-L515

dormullor · 2024-04-12T05:50:47Z

@zetaab Although I have added all assets files and containers into s3 and ECR and configured kops to use it, when looking at the nodeup logs I can see an error when trying to retrieve the s3 cluster-completed.spec even if I configure a s3 vpc endpoint.

That's because kops using the s3://bucket-name schema and the s3 vpc endpoint use the full s3 DNS name (bucket-name.s3.us-east-1.amazonaws.com).

As a result, kops cannot be used in a disconnected environment on AWS

W0412 06:49:07.558115    1040 main.go:133] got error running nodeup (will retry in 30s): error loading Cluster "s3://kops-state-****/*****/cluster-completed.spec": file does not exist

k8s-triage-robot · 2024-07-30T09:44:11Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot · 2024-08-29T09:58:37Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle rotten
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot · 2024-09-28T10:45:24Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen
Mark this issue as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

k8s-ci-robot · 2024-09-28T10:45:29Z

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied

After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied

After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen

Mark this issue as fresh with /remove-lifecycle rotten

Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Apr 5, 2024

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 30, 2024

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Aug 29, 2024

k8s-ci-robot closed this as not planned Won't fix, can't repro, duplicate, stale Sep 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kops on a disconnected environment #16453

Kops on a disconnected environment #16453

dormullor commented Apr 5, 2024 •

edited

Loading

zetaab commented Apr 6, 2024

dormullor commented Apr 7, 2024

zetaab commented Apr 7, 2024 •

edited

Loading

zetaab commented Apr 7, 2024

dormullor commented Apr 12, 2024 •

edited

Loading

k8s-triage-robot commented Jul 30, 2024

k8s-triage-robot commented Aug 29, 2024

k8s-triage-robot commented Sep 28, 2024

k8s-ci-robot commented Sep 28, 2024

Kops on a disconnected environment #16453

Kops on a disconnected environment #16453

Comments

dormullor commented Apr 5, 2024 • edited Loading

zetaab commented Apr 6, 2024

dormullor commented Apr 7, 2024

zetaab commented Apr 7, 2024 • edited Loading

zetaab commented Apr 7, 2024

dormullor commented Apr 12, 2024 • edited Loading

k8s-triage-robot commented Jul 30, 2024

k8s-triage-robot commented Aug 29, 2024

k8s-triage-robot commented Sep 28, 2024

k8s-ci-robot commented Sep 28, 2024

dormullor commented Apr 5, 2024 •

edited

Loading

zetaab commented Apr 7, 2024 •

edited

Loading

dormullor commented Apr 12, 2024 •

edited

Loading