Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Network configuration when using private clusters #4868

Closed
akoenig opened this issue Jul 22, 2019 · 21 comments
Closed

Network configuration when using private clusters #4868

akoenig opened this issue Jul 22, 2019 · 21 comments
Labels
area/networking kind/question Further information is requested

Comments

@akoenig
Copy link

akoenig commented Jul 22, 2019

In what area(s)?

/area networking

Ask your question here:

We encountered an issue when using Knative in a private cluster environment. Consider the following architecture:

We have a cluster for our engineers running in GKE as a private cluster (master and nodes are inaccessible via the Internet). Unfortunately, when applying a Knative service it fails with:

Internal error occurred: failed calling webhook "webhook.serving.knative.dev": Post https://webhook.knative-serving.svc:443/?timeout=30s: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)

Everything works as expected when installing the service on a public cluster. Any help on this is highly appreciated 🙂

@akoenig akoenig added the kind/question Further information is requested label Jul 22, 2019
@vagababov
Copy link
Contributor

This is quite strange, since https://webhook.knative-serving.svc:443/ is definitely a cluster local address.
Do you have logs from Webhook itself? Did it succeed to register?

@bbhuston
Copy link

I am experiencing the exact same issue. I have installed knative (build/serving/eventing) on 1.11x, 1.12x, and 1.13x private GKE clusters. These clusters have the latest istio installed and have the master authorized networks disabled (have tried this with these networks enabled as well) and am unable to creates builds or ksvcs under any scenario. Have also tried installed knative v0.6x and v0.7x under all the above GKE settings and no luck either

@mattmoor
Copy link
Member

mattmoor commented Aug 3, 2019

Can you share information about how to create a cluster like the one where you are seeing this?

@bbhuston
Copy link

bbhuston commented Aug 3, 2019

@mattmoor Below are the configurations that I'm using to create my gke cluster and to bootstrap it with knative.

# Generate legacy default auth credential file for use with terraform 
gcloud auth application-default login

# Download latest terraform client, if not already present
brew install terraform

# Create terraform file that uses  [official GCP GKE module](https://registry.terraform.io/modules/terraform-google-modules/kubernetes-engine/google/4.1.0/submodules/beta-private-cluster)
cat << EOF > main.tf
module "gke" {
  source                     = "terraform-google-modules/kubernetes-engine/google//modules/beta-private-cluster"
  version                    = "4.1.0"
  project_id                 = "my-project"
  name                       = "private-gke-cluster-1"
  regional                   = true
  region                     = "us-east1"
  zones                      = ["us-east1-b", "us-east1-c", "us-east1-d"]
  network                    = "default"
  subnetwork                 = "default"
  ip_range_pods              = ""
  ip_range_services          = ""
  http_load_balancing        = true
  horizontal_pod_autoscaling = true
  kubernetes_dashboard       = false
  network_policy             = false
  kubernetes_version         = "1.13.7-gke.8"
  issue_client_certificate   = true
  service_account            =  "my-account@my-project.iam.gserviceaccount.com"
  enable_private_nodes       = true
  enable_private_endpoint    = false
  remove_default_node_pool   = true
  istio                      = true
  cloudrun                   = false
  
  node_pools = [
    {
      name               = "default-node-pool"
      machine_type       = "n1-standard-2"
      min_count          = 1
      max_count          = 100
      disk_size_gb       = 100
      disk_type          = "pd-standard"
      image_type         = "COS"
      auto_repair        = true
      auto_upgrade       = true
      service_account    = "my-account@my-project.iam.gserviceaccount.com"
      preemptible        = false
      initial_node_count = 1
    },
  ]

  node_pools_oauth_scopes = {
    all = []

    default-node-pool = [
      "https://www.googleapis.com/auth/cloud-platform",
    ]
  }

  node_pools_labels = {
    all = {}

    default-node-pool = {
      default-node-pool = "true"
    }
  }

  node_pools_metadata = {
    all = {}

    default-node-pool = {
      node-pool-metadata-custom-value = "my-node-pool"
    }
  }

  node_pools_taints = {
    all = []

    default-node-pool = [
      {
        key    = "default-node-pool"
        value  = "true"
        effect = "PREFER_NO_SCHEDULE"
      },
    ]
  }

  node_pools_tags = {
    all = []

    default-node-pool = [
      "default-node-pool",
    ]
  }
}
EOF


# Create GKE cluster via standard terraform client commands
terraform init
terraform plan
terraform apply

# Manually remove the GKE cluster's master authorized network (as per [this issue](https://github.com/terraform-providers/terraform-provider-google/issues/3098))
gcloud container clusters update private-gke-cluster-1  --region us-east1 --no-enable-master-authorized-networks

# Install knative CRDs as per official guidance
kubectl apply --selector knative.dev/crd-install=true \
--filename https://github.com/knative/serving/releases/download/v0.7.0/serving.yaml \
--filename https://github.com/knative/build/releases/download/v0.7.0/build.yaml \
--filename https://github.com/knative/eventing/releases/download/v0.7.0/release.yaml \
--filename https://github.com/knative/serving/releases/download/v0.7.0/monitoring.yaml

# Install knative controllers etc (sans the monitoring stack)
kubectl apply --filename https://github.com/knative/serving/releases/download/v0.7.0/serving.yaml --selector networking.knative.dev/certificate-provider!=cert-manager \
--filename https://github.com/knative/build/releases/download/v0.7.0/build.yaml \
--filename https://github.com/knative/eventing/releases/download/v0.7.0/release.yaml

# Once knative pods are up, run the follow knative-build hello-world example
cat << EOF > hello-knative-build.yaml
apiVersion: build.knative.dev/v1alpha1
kind: Build
metadata:
  name: hello
spec:
  steps:
  - image: busybox
    args: ['echo', 'Hello, World!']
EOF

kubectl apply -f hello-knative-build.yaml

After running kubectl apply on the build manifest, no build resources are ever created on the cluster and in about 30 seconds I receive the same timeout error message that the OP reported.

@mattmoor
Copy link
Member

mattmoor commented Aug 4, 2019

cc @tcnghia

@mattmoor
Copy link
Member

mattmoor commented Aug 4, 2019

Thanks for the detailed repro instructions. Early this week will be a bit chaotic shutting down 0.8, but this should be very helpful attempting to reproduce what you are seeing so that we can get your problem sorted out.

@bbhuston
Copy link

@mattmoor Hate to pester you, but I'm curious if there has been any update on the knative + private GKE issue.

@mattmoor
Copy link
Member

I'll try to find someone to look into it. I pinged @tcnghia , but realized he is out today. Sorry for the delay.

@bbhuston
Copy link

bbhuston commented Aug 21, 2019 via email

@tcnghia
Copy link
Contributor

tcnghia commented Aug 21, 2019

I think this is a firewall issue, similar that of elastic/cloud-on-k8s#1437

Can you please try the workaround there? thanks

@tcnghia
Copy link
Contributor

tcnghia commented Aug 21, 2019

8443 is the port that you need to allow

https://github.com/knative/serving/blob/master/config/400-webhook-service.yaml#L26

@tcnghia
Copy link
Contributor

tcnghia commented Aug 21, 2019

The short explanation is that GKE private cluster by default only allows the GKE master to access your Services at port 443 or 80. Our webhook uses 8443 here, so it needs to be white-listed.

Instruction for that is here https://cloud.google.com/kubernetes-engine/docs/how-to/private-clusters#add_firewall_rules

There may be other webhooks like Istio's that may need a white list.

@sjmiller609
Copy link

My identical problem was resolved by @tcnghia's suggestion to add ingress 8443 to the firewall

@tcnghia
Copy link
Contributor

tcnghia commented Aug 21, 2019

BTW, the reason why 443 wasn't used is to avoid a privileged port (knative/build#604).

I just look at Istio's webhooks and it look like they use 443, so no need to have additional rule for Istio. 8443 should be enough.

@tcnghia
Copy link
Contributor

tcnghia commented Aug 21, 2019

@sjmiller609 awesome! thanks a lot for confirmation.

@mattmoor
Copy link
Member

@bbhuston if you could confirm this works, then we should discuss if/what changes we need to close this out.

@bbhuston
Copy link

@mattmoor Sorry for the delayed response. Was on an awesome vacation and was a little too lazy to check up on this.

Anyway, I reran the terraform/gke/knative setup that I posted above and then manually opened up port 8443 for the clusters master and worker node firewall rules. And BOOM! It works. Thank you for the follow-up and please feel free to close this issue.

@tcnghia
Copy link
Contributor

tcnghia commented Sep 20, 2019

Thanks for confirming.

I think we'll need to update the doc with this information, since avoiding 443 is still a good path (avoiding privileged port)

/close

@knative-prow-robot
Copy link
Contributor

@tcnghia: Closing this issue.

In response to this:

Thanks for confirming.

I think we'll need to update the doc with this information, since avoiding 443 is still a good path (avoiding privileged port)

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@angelafunk
Copy link

@ceefour
Copy link

ceefour commented Apr 2, 2020

I got this issue on microk8s on Windows:

D:\project_amanah\n8n-tutorial>kubectl apply -f serving-n8n.yaml
Error from server: error when creating "serving-n8n.yaml": conversion webhook for serving.knative.dev/v1, Kind=Service failed: Post https://webhook.knative-serving.svc:443/?timeout=30s: dial tcp 10.152.183.205:443: connect: connection refused

Any suggestion on what I should do to start diagnosing them cause and finding alternatives?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/networking kind/question Further information is requested
Projects
None yet
Development

No branches or pull requests

9 participants