New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error connecting to kubernetes-proxy #758
Comments
I didn't notice this being an issue when testing and validating Openshift 4.12 with the Splunk Otel Collector Chart v0.72.0. |
@jvoravong
Setting proxy.enabled to false stops the errors. Errors:
|
0.75.0 on docker-desktop k8s also triggers it, can confirm can be silenced by disabling the proxy. Still digging to confirm if it's simply because the proxy is not exposing it's port. |
@matthewmodestino docker desktop is different, you might not have k8s proxy configured to expose metrics correctly. @kishah-lilly did you guys change the default port?
|
@dloucasfx we did not. It is currently set to 29101. Thanks |
Would you be able to post more debug info like the labels or pod yaml so we can verify what is happening? It looks like we cover port 29101. |
Pod yaml: kind: Pod
apiVersion: v1
metadata:
generateName: splunk-otel-collector-chart-agent-
annotations:
checksum/config: [REDACTED]
kubectl.kubernetes.io/default-container: otel-collector
openshift.io/scc: splunk-otel-collector-chart
resourceVersion: [REDACTED]
name: splunk-otel-collector-chart-agent-2tmwk
uid: [REDACTED]
creationTimestamp: '2023-04-27T19:01:41Z'
managedFields:
- manager: kube-controller-manager
operation: Update
apiVersion: v1
time: '2023-04-27T19:01:41Z'
fieldsType: FieldsV1
fieldsV1:
'f:metadata':
'f:annotations':
.: {}
'f:checksum/config': {}
'f:kubectl.kubernetes.io/default-container': {}
'f:generateName': {}
'f:labels':
.: {}
'f:app': {}
'f:controller-revision-hash': {}
'f:pod-template-generation': {}
'f:release': {}
'f:ownerReferences':
.: {}
'k:{"uid":[REDACTED]}': {}
'f:spec':
'f:volumes':
.: {}
'k:{"name":"host-dev"}':
.: {}
'f:hostPath':
.: {}
'f:path': {}
'f:type': {}
'f:name': {}
'k:{"name":"host-etc"}':
.: {}
'f:hostPath':
.: {}
'f:path': {}
'f:type': {}
'f:name': {}
'k:{"name":"host-proc"}':
.: {}
'f:hostPath':
.: {}
'f:path': {}
'f:type': {}
'f:name': {}
'k:{"name":"host-run-udev-data"}':
.: {}
'f:hostPath':
.: {}
'f:path': {}
'f:type': {}
'f:name': {}
'k:{"name":"host-sys"}':
.: {}
'f:hostPath':
.: {}
'f:path': {}
'f:type': {}
'f:name': {}
'k:{"name":"host-var-run-utmp"}':
.: {}
'f:hostPath':
.: {}
'f:path': {}
'f:type': {}
'f:name': {}
'k:{"name":"otel-configmap"}':
.: {}
'f:configMap':
.: {}
'f:defaultMode': {}
'f:items': {}
'f:name': {}
'f:name': {}
'f:containers':
'k:{"name":"otel-collector"}':
'f:image': {}
'f:volumeMounts':
.: {}
'k:{"mountPath":"/conf"}':
.: {}
'f:mountPath': {}
'f:name': {}
'k:{"mountPath":"/hostfs/dev"}':
.: {}
'f:mountPath': {}
'f:name': {}
'f:readOnly': {}
'k:{"mountPath":"/hostfs/etc"}':
.: {}
'f:mountPath': {}
'f:name': {}
'f:readOnly': {}
'k:{"mountPath":"/hostfs/proc"}':
.: {}
'f:mountPath': {}
'f:name': {}
'f:readOnly': {}
'k:{"mountPath":"/hostfs/run/udev/data"}':
.: {}
'f:mountPath': {}
'f:name': {}
'f:readOnly': {}
'k:{"mountPath":"/hostfs/sys"}':
.: {}
'f:mountPath': {}
'f:name': {}
'f:readOnly': {}
'k:{"mountPath":"/hostfs/var/run/utmp"}':
.: {}
'f:mountPath': {}
'f:name': {}
'f:readOnly': {}
'f:terminationMessagePolicy': {}
.: {}
'f:resources':
.: {}
'f:limits':
.: {}
'f:cpu': {}
'f:memory': {}
'f:requests':
.: {}
'f:cpu': {}
'f:memory': {}
'f:command': {}
'f:livenessProbe':
.: {}
'f:failureThreshold': {}
'f:httpGet':
.: {}
'f:path': {}
'f:port': {}
'f:scheme': {}
'f:periodSeconds': {}
'f:successThreshold': {}
'f:timeoutSeconds': {}
'f:env':
'k:{"name":"K8S_NODE_NAME"}':
.: {}
'f:name': {}
'f:valueFrom':
.: {}
'f:fieldRef': {}
'k:{"name":"HOST_ETC"}':
.: {}
'f:name': {}
'f:value': {}
'k:{"name":"K8S_POD_NAME"}':
.: {}
'f:name': {}
'f:valueFrom':
.: {}
'f:fieldRef': {}
'k:{"name":"HOST_DEV"}':
.: {}
'f:name': {}
'f:value': {}
'k:{"name":"HOST_PROC_MOUNTINFO"}':
.: {}
'f:name': {}
'f:value': {}
'k:{"name":"SPLUNK_OBSERVABILITY_ACCESS_TOKEN"}':
.: {}
'f:name': {}
'f:valueFrom':
.: {}
'f:secretKeyRef': {}
.: {}
'k:{"name":"K8S_POD_UID"}':
.: {}
'f:name': {}
'f:valueFrom':
.: {}
'f:fieldRef': {}
'k:{"name":"HOST_VAR"}':
.: {}
'f:name': {}
'f:value': {}
'k:{"name":"K8S_NAMESPACE"}':
.: {}
'f:name': {}
'f:valueFrom':
.: {}
'f:fieldRef': {}
'k:{"name":"HOST_RUN"}':
.: {}
'f:name': {}
'f:value': {}
'k:{"name":"HOST_SYS"}':
.: {}
'f:name': {}
'f:value': {}
'k:{"name":"SPLUNK_MEMORY_TOTAL_MIB"}':
.: {}
'f:name': {}
'f:value': {}
'k:{"name":"K8S_NODE_IP"}':
.: {}
'f:name': {}
'f:valueFrom':
.: {}
'f:fieldRef': {}
'k:{"name":"K8S_POD_IP"}':
.: {}
'f:name': {}
'f:valueFrom':
.: {}
'f:fieldRef': {}
'k:{"name":"HOST_PROC"}':
.: {}
'f:name': {}
'f:value': {}
'f:readinessProbe':
.: {}
'f:failureThreshold': {}
'f:httpGet':
.: {}
'f:path': {}
'f:port': {}
'f:scheme': {}
'f:periodSeconds': {}
'f:successThreshold': {}
'f:timeoutSeconds': {}
'f:terminationMessagePath': {}
'f:imagePullPolicy': {}
'f:ports':
.: {}
'k:{"containerPort":4317,"protocol":"TCP"}':
.: {}
'f:containerPort': {}
'f:hostPort': {}
'f:name': {}
'f:protocol': {}
'k:{"containerPort":4318,"protocol":"TCP"}':
.: {}
'f:containerPort': {}
'f:hostPort': {}
'f:name': {}
'f:protocol': {}
'k:{"containerPort":9943,"protocol":"TCP"}':
.: {}
'f:containerPort': {}
'f:hostPort': {}
'f:name': {}
'f:protocol': {}
'k:{"containerPort":55681,"protocol":"TCP"}':
.: {}
'f:containerPort': {}
'f:hostPort': {}
'f:name': {}
'f:protocol': {}
'f:name': {}
'f:dnsPolicy': {}
'f:tolerations': {}
'f:serviceAccount': {}
'f:restartPolicy': {}
'f:schedulerName': {}
'f:hostNetwork': {}
'f:nodeSelector': {}
'f:terminationGracePeriodSeconds': {}
'f:serviceAccountName': {}
'f:enableServiceLinks': {}
'f:securityContext': {}
'f:affinity':
.: {}
'f:nodeAffinity':
.: {}
'f:requiredDuringSchedulingIgnoredDuringExecution': {}
- manager: kubelet
operation: Update
apiVersion: v1
time: '2023-05-01T15:50:56Z'
fieldsType: FieldsV1
fieldsV1:
'f:status':
'f:conditions':
'k:{"type":"ContainersReady"}':
.: {}
'f:lastProbeTime': {}
'f:lastTransitionTime': {}
'f:status': {}
'f:type': {}
'k:{"type":"Initialized"}':
.: {}
'f:lastProbeTime': {}
'f:lastTransitionTime': {}
'f:status': {}
'f:type': {}
'k:{"type":"Ready"}':
.: {}
'f:lastProbeTime': {}
'f:lastTransitionTime': {}
'f:status': {}
'f:type': {}
'f:containerStatuses': {}
'f:hostIP': {}
'f:phase': {}
'f:podIP': {}
'f:podIPs':
.: {}
'k:{"ip":[REDACTED]}':
.: {}
'f:ip': {}
'f:startTime': {}
subresource: status
namespace: splunk-opentelemetry
ownerReferences:
- apiVersion: apps/v1
kind: DaemonSet
name: splunk-otel-collector-chart-agent
uid: [REDACTED]
controller: true
blockOwnerDeletion: true
labels:
app: splunk-otel-collector
controller-revision-hash: [REDACTED]
pod-template-generation: '34'
release: splunk-otel-collector-chart
spec:
nodeSelector:
kubernetes.io/os: linux
restartPolicy: Always
serviceAccountName: splunk-otel-collector-chart
imagePullSecrets:
- name: splunk-otel-collector-chart-dockercfg-j8wv2
priority: 0
schedulerName: default-scheduler
hostNetwork: true
enableServiceLinks: true
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchFields:
- key: metadata.name
operator: In
values:
- [REDACTED]
terminationGracePeriodSeconds: 600
preemptionPolicy: PreemptLowerPriority
nodeName: [REDACTED]
securityContext:
seLinuxOptions:
user: system_u
role: system_r
type: spc_t
level: s0
fsGroup: 1002050000
containers:
- resources:
limits:
cpu: 200m
memory: 500Mi
requests:
cpu: 200m
memory: 500Mi
readinessProbe:
httpGet:
path: /
port: 13133
scheme: HTTP
timeoutSeconds: 1
periodSeconds: 10
successThreshold: 1
failureThreshold: 3
terminationMessagePath: /dev/termination-log
name: otel-collector
command:
- /otelcol
- '--config=/conf/relay.yaml'
livenessProbe:
httpGet:
path: /
port: 13133
scheme: HTTP
timeoutSeconds: 1
periodSeconds: 10
successThreshold: 1
failureThreshold: 3
env:
- name: SPLUNK_MEMORY_TOTAL_MIB
value: '500'
- name: K8S_NODE_NAME
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: spec.nodeName
- name: K8S_NODE_IP
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: status.hostIP
- name: K8S_POD_IP
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: status.podIP
- name: K8S_POD_NAME
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.name
- name: K8S_POD_UID
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.uid
- name: K8S_NAMESPACE
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.namespace
- name: SPLUNK_OBSERVABILITY_ACCESS_TOKEN
valueFrom:
secretKeyRef:
name: splunk-otel-collector
key: splunk_observability_access_token
- name: HOST_PROC
value: /hostfs/proc
- name: HOST_SYS
value: /hostfs/sys
- name: HOST_ETC
value: /hostfs/etc
- name: HOST_VAR
value: /hostfs/var
- name: HOST_RUN
value: /hostfs/run
- name: HOST_DEV
value: /hostfs/dev
- name: HOST_PROC_MOUNTINFO
value: /proc/self/mountinfo
securityContext:
capabilities:
drop:
- ALL
readOnlyRootFilesystem: true
ports:
- name: otlp
hostPort: 4317
containerPort: 4317
protocol: TCP
- name: otlp-http
hostPort: 4318
containerPort: 4318
protocol: TCP
- name: otlp-http-old
hostPort: 55681
containerPort: 55681
protocol: TCP
- name: signalfx
hostPort: 9943
containerPort: 9943
protocol: TCP
imagePullPolicy: IfNotPresent
volumeMounts:
- name: otel-configmap
mountPath: /conf
- name: host-dev
readOnly: true
mountPath: /hostfs/dev
- name: host-etc
readOnly: true
mountPath: /hostfs/etc
- name: host-proc
readOnly: true
mountPath: /hostfs/proc
- name: host-run-udev-data
readOnly: true
mountPath: /hostfs/run/udev/data
- name: host-sys
readOnly: true
mountPath: /hostfs/sys
- name: host-var-run-utmp
readOnly: true
mountPath: /hostfs/var/run/utmp
- name: kube-api-access-wnbhj
readOnly: true
mountPath: /var/run/secrets/kubernetes.io/serviceaccount
terminationMessagePolicy: File
image: 'quay.io/signalfx/splunk-otel-collector:0.75.0'
serviceAccount: splunk-otel-collector-chart
volumes:
- name: host-dev
hostPath:
path: /dev
type: ''
- name: host-etc
hostPath:
path: /etc
type: ''
- name: host-proc
hostPath:
path: /proc
type: ''
- name: host-run-udev-data
hostPath:
path: /run/udev/data
type: ''
- name: host-sys
hostPath:
path: /sys
type: ''
- name: host-var-run-utmp
hostPath:
path: /var/run/utmp
type: ''
- name: otel-configmap
configMap:
name: splunk-otel-collector-chart-otel-agent
items:
- key: relay
path: relay.yaml
defaultMode: 420
- name: kube-api-access-wnbhj
projected:
sources:
- serviceAccountToken:
expirationSeconds: 3607
path: token
- configMap:
name: kube-root-ca.crt
items:
- key: ca.crt
path: ca.crt
- downwardAPI:
items:
- path: namespace
fieldRef:
apiVersion: v1
fieldPath: metadata.namespace
- configMap:
name: openshift-service-ca.crt
items:
- key: service-ca.crt
path: service-ca.crt
defaultMode: 420
dnsPolicy: ClusterFirstWithHostNet
tolerations:
- operator: Exists |
@kishah-lilly I wonder if the service is only allowing https as we are currently trying http. If it's not caused by https, then can you verify that port Thanks |
I tried adding smartagent/kubernetes-proxy:
config:
extraDimensions:
metric_source: kubernetes-proxy
port: 29101
type: kubernetes-proxy
useHTTPS: true
rule: type == "pod" && labels["app"] == "sdn" Also tried adding Here is the output using $ ss -anpe | grep "29101" | grep "LISTEN"
tcp LISTEN 0 128 127.0.0.1:29101
0.0.0.0:* ino:42391 sk:11f <-> Thanks |
We ran into the same issue. The issue seems that the service is listening only on port 29101 on 127.0.0.1 and not on the IP of the node itself, while the OTEL Collector pod is trying to connect to node-ip:29101 instead of localhost. I think this is the issue at least for us. Do you agree that this is the general issue? If this is indeed the issue, should the solution be to make OTEL Collector connect to localhost instead of the node-ip, because it is running on HostNetwork anyways? |
@sveno1990 I have seen this happening before in an openshift cluster. The receiver is configured to use the discovered pod IP @jvoravong have you see this in our lab cluster? @kishah-lilly can you try the loopback address; if it's still not working, please share the errors you are getting. |
Openshift Notes:
|
We encounter the issue on Openshift 4.10. For troubleshooting purposes I manually added "host: localhost" to the config below in config map splunk-otel-collector-otel-agent. The solved the issue, however, that is not a sustainable solution. |
Confirmed that you can set host to localhost, 0.0.0.0, 127.0.0.1 with v4.10 and the monitor works. However, warning messages can populate the logs with the following message if some of these values are used. Still looking into other solutions. |
A fix for OpenShift v4.10 and this issue was released with Update the Kubernetes Proxy monitor for OpenShift clusters #810 |
We've implemented fixes for supported Kubernetes distributions. The collector agent's logging configurations have been adjusted to prevent excessive errors related to kubernetes-proxy connections on untested or unsupported distributions. Additionally, we've expanded the documentation section on known kube-proxy issues for clarity. This ticket is now closed. |
The change implemented here https://github.com/signalfx/splunk-otel-collector-chart/pull/711/files fixes the kubernetes-scheduler issue however, it does not fix the kubernetes-proxy issue.
From original post:
Originally posted by @kishah-lilly in #697 (comment)
The text was updated successfully, but these errors were encountered: