Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dns lookup error to kubernetes.default.svc in-cluster when using hostNetwork #953

Closed
withlin opened this issue Jul 11, 2022 · 8 comments
Closed
Labels
bug Something isn't working client http issues with the client help wanted Not immediately prioritised, please help! invalid rejected as a valid issue

Comments

@withlin
Copy link

withlin commented Jul 11, 2022

Current and expected behavior

08:15:51 [DEBUG] (6) kube_client::client::builder: [/root/.cargo/registry/src/mirrors.ustc.edu.cn-61ef6e0cd06fb9b8/kube-client-0.74.0/src/client/builder.rs:126] HTTP; http.method=GET http.url=https://kubernetes.default.svc/api/v1/pods? otel.name="list" otel.kind="client"
08:15:51 [DEBUG] (6) kube_client::client::builder: [/root/.cargo/registry/src/mirrors.ustc.edu.cn-61ef6e0cd06fb9b8/kube-client-0.74.0/src/client/builder.rs:137] requesting
08:15:51 [DEBUG] (10) hyper::client::connect::dns: [/root/.cargo/registry/src/mirrors.ustc.edu.cn-61ef6e0cd06fb9b8/hyper-0.14.20/src/client/connect/dns.rs:122] resolving host="kubernetes.default.svc"
08:15:51 [DEBUG] (6) kube_client::client::builder: [/root/.cargo/registry/src/mirrors.ustc.edu.cn-61ef6e0cd06fb9b8/kube-client-0.74.0/src/client/builder.rs:126] HTTP; otel.status_code="ERROR"
08:15:51 [DEBUG] (6) tower::buffer::worker: [/root/.cargo/registry/src/mirrors.ustc.edu.cn-61ef6e0cd06fb9b8/tower-0.4.13/src/buffer/worker.rs:197] service.ready=true message=processing request
08:15:51 [DEBUG] (6) kube_client::client::builder: [/root/.cargo/registry/src/mirrors.ustc.edu.cn-61ef6e0cd06fb9b8/kube-client-0.74.0/src/client/builder.rs:126] HTTP; http.method=GET http.url=https://kubernetes.default.svc/api/v1/pods? otel.name="list" otel.kind="client"
08:15:51 [DEBUG] (6) kube_client::client::builder: [/root/.cargo/registry/src/mirrors.ustc.edu.cn-61ef6e0cd06fb9b8/kube-client-0.74.0/src/client/builder.rs:137] requesting
08:15:51 [ERROR] kube_client::client::builder: [/root/.cargo/registry/src/mirrors.ustc.edu.cn-61ef6e0cd06fb9b8/kube-client-0.74.0/src/client/builder.rs:164] failed with error error trying to connect: dns error: failed to lookup address information: Name or service not known
08:15:51 [DEBUG] (11) hyper::client::connect::dns: [/root/.cargo/registry/src/mirrors.ustc.edu.cn-61ef6e0cd06fb9b8/hyper-0.14.20/src/client/connect/dns.rs:122] resolving host="kubernetes.default.svc"

Daemonset

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: traffic-billing
  namespace: kube-system
  labels:
    app: traffic-billing
spec:
  selector:
    matchLabels:
      name: ftraffic-billing
  template:
    metadata:
      labels:
        name: ftraffic-billing
    spec:
      containers:
      - name: traffic-billing
        image: xxxx:latext
        resources:
          limits:
            memory: 200Mi
          requests:
            cpu: 100m
            memory: 200Mi
        securityContext:
          privileged: true
      terminationGracePeriodSeconds: 30
      serviceAccountName: traffic-billing
      serviceAccount: traffic-billing
      hostNetwork: true

rbac

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: traffic-billing
rules:
  - apiGroups: ["*"]
    resources: ["pod"]
    verbs: ["get", "watch", "list"]

---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: traffic-billing
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: traffic-billing
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: traffic-billing
subjects:
  - kind: ServiceAccount
    name: traffic-billing
    namespace: kube-system

source code

let client = Client::try_default().await?;
let api = Api::<Pod>::all(client);

Possible solution

No response

Additional context

No response

Environment

➜ kubectl version --short

Client Version: v1.23.6
Server Version: v1.20.7
WARNING: version difference between client (1.23) and server (1.20) exceeds the supported minor version skew of +/-1

Configuration and features

kube = { version = "0.74.0", default-features = false, features = ["client", "rustls-tls","derive",
    "runtime"] }
k8s-openapi = { version = "0.15.0", features = ["v1_24"] }

Affected crates

No response

Would you like to work on fixing this bug?

No response

@withlin withlin added the bug Something isn't working label Jul 11, 2022
@withlin
Copy link
Author

withlin commented Jul 11, 2022

may be not support daemonset?

@clux
Copy link
Member

clux commented Jul 11, 2022

There's nothing special about daemonsets from our POV. It should work. The error you are getting is actually from just failing to communicate with the cluster:

dns error: failed to lookup address information: Name or service not known

rustls should work in-cluster, so it should not be that (although you could try openssl instead), your daemonset selectors are a bit weird (app vs name, and traffic-billing vs ftraffic-billing) but also should not be relevant.

just ruling out something about the cluster, you have the default kubernetes service, right? k get service kubernetes -n default. not sure i see something obvious here.

@withlin
Copy link
Author

withlin commented Jul 11, 2022

(app vs name, and traffic-billing vs ftraffic-billing)

I have corrected it but still wrong

➜ kubectl get svc -n default
NAME                                      TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)          AGE
kubernetes                                ClusterIP   10.233.0.1      <none>        443/TCP          146d

BTW, i am use deployment. which work for me!!!
@clux cc

@clux
Copy link
Member

clux commented Jul 11, 2022

oh, interesting. if it works with a Deployment instead then it must be something with the yaml or the cluster that's different. maybe something network policy related?

I double-checked in controller-rs by changing the deployment to a daemonset:

diff --git yaml/deployment.yaml yaml/deployment.yaml
index 236a9b0..f002457 100644
--- yaml/deployment.yaml
+++ yaml/deployment.yaml
@@ -61,14 +61,14 @@ spec:
 ---
 # Main deployment
 apiVersion: apps/v1
-kind: Deployment
+kind: DaemonSet
 metadata:
   name: doc-controller
   namespace: default
   labels:
     app: doc-controller
 spec:
-  replicas: 1
+  #replicas: 1
   selector:
     matchLabels:
       app: doc-controller
@@ -83,7 +83,7 @@ spec:
       serviceAccountName: doc-controller
       containers:
       - name: doc-controller
-        image: clux/controller:otel
+        image: clux/controller:latest
         imagePullPolicy: Always
         resources:
           limits:

and that controller worked perfectly. you could try that one to check a system that works, but otherwise not really sure why it's failing for you 🤔

@withlin
Copy link
Author

withlin commented Jul 12, 2022

oh, interesting. if it works with a Deployment instead then it must be something with the yaml or the cluster that's different. maybe something network policy related?

I double-checked in controller-rs by changing the deployment to a daemonset:

diff --git yaml/deployment.yaml yaml/deployment.yaml
index 236a9b0..f002457 100644
--- yaml/deployment.yaml
+++ yaml/deployment.yaml
@@ -61,14 +61,14 @@ spec:
 ---
 # Main deployment
 apiVersion: apps/v1
-kind: Deployment
+kind: DaemonSet
 metadata:
   name: doc-controller
   namespace: default
   labels:
     app: doc-controller
 spec:
-  replicas: 1
+  #replicas: 1
   selector:
     matchLabels:
       app: doc-controller
@@ -83,7 +83,7 @@ spec:
       serviceAccountName: doc-controller
       containers:
       - name: doc-controller
-        image: clux/controller:otel
+        image: clux/controller:latest
         imagePullPolicy: Always
         resources:
           limits:

and that controller worked perfectly. you could try that one to check a system that works, but otherwise not really sure why it's failing for you 🤔

There's one key thing you didn't notice, that you didn't turn on hostNetwork: true,if turn on hostNetwork= false,which worked perfectly. @clux

@clux
Copy link
Member

clux commented Jul 12, 2022

Wow. That's new; indeed it does not work with hostNetwork: true, here on default tls:

HyperError(hyper::Error(Connect, ConnectError("dns error", Custom { kind: Uncategorized, error: "failed to lookup address information: Name does not resolve" })))', /volume/src/manager.rs:232:14

Now, I'm not sure if that's a problem or not. Maybe you have to talk to the apiserver with the old method when it's using host-networking? I don't see any immediately similar bugs in kubernetes org but i didn't look very hard.

Will update the title of this bug to more correctly reflect the situation. Thanks for the report. Glad you got it working :-)

@clux clux changed the title failed with error error trying to connect: dns error: failed to lookup address information: Name or service not known dns lookup error to kubernetes.default.svc in-cluster when using hostNetwork Jul 12, 2022
@clux clux added help wanted Not immediately prioritised, please help! question Direction unclear; possibly a bug, possibly could be improved. client http issues with the client labels Jul 12, 2022
@kazk
Copy link
Member

kazk commented Jul 12, 2022

Maybe you need dnsPolicy: ClusterFirstWithHostNet?

https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/#pod-s-dns-policy

Try adding that under hostNetwork: true.

@clux
Copy link
Member

clux commented Jul 12, 2022

Ahh, nice find. Yes that does indeed fix it! Working diff in controller-rs:

--- yaml/deployment.yaml
+++ yaml/deployment.yaml
@@ -81,9 +81,11 @@ spec:
         prometheus.io/port: "8080"
     spec:
       serviceAccountName: doc-controller
+      hostNetwork: true
+      dnsPolicy: ClusterFirstWithHostNet
       containers:
       - name: doc-controller
-        image: clux/controller:otel
+        image: clux/controller:latest
         imagePullPolicy: Always
         resources:
           limits:

Going to close this as invalid as its not a bug on our end.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working client http issues with the client help wanted Not immediately prioritised, please help! invalid rejected as a valid issue
Projects
None yet
Development

No branches or pull requests

3 participants