Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Timeout while request K3S cluster #17175

Closed
didlawowo opened this issue Mar 15, 2024 · 1 comment
Closed

[Bug]: Timeout while request K3S cluster #17175

didlawowo opened this issue Mar 15, 2024 · 1 comment
Labels

Comments

@didlawowo
Copy link

Bug description

i'have installed netdata using helm chart in k3s cluster with 7 seven node
works like a charm but produce many errror like


netdata time=2024-03-15T11:11:17.010Z level=error msg="context deadline exceeded (Client.Timeout or context cancellation while reading body)" plugin=go.d collector=k8s_kubelet job=k8s_kubelet
netdata time=2024-03-15T11:11:20.002Z level=error msg="Get \"https://localhost:10250/metrics\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)" plugin=go.d collector=k8s_kubelet job=k8s_kubelet
netdata time=2024-03-15T11:11:37.677+00:00 comm=netdata source=daemon level=error tid=2875749 thread=P[cgroups] msg="child pid 102340 exited with code 3."
netdata time=2024-03-15T11:11:45.001Z level=error msg="Get \"https://localhost:10250/metrics\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)" plugin=go.d collector=k8s_kubelet job=k8s_kubelet
netdata time=2024-03-15T11:11:50.000Z level=error msg="Get \"https://localhost:10250/metrics\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)" plugin=go.d collector=k8s_kubelet job=k8s_kubelet
netdata time=2024-03-15T11:11:52.003Z level=error msg="Get \"https://localhost:10250/metrics\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)" plugin=go.d collector=k8s_kubelet job=k8s_kubelet
netdata time=2024-03-15T11:11:56.002Z level=error msg="Get \"https://localhost:10250/metrics\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)" plugin=go.d collector=k8s_kubelet job=k8s_kubelet
netdata time=2024-03-15T11:12:02.001Z level=error msg="Get \"https://localhost:10250/metrics\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)" plugin=go.d collector=k8s_kubelet job=k8s_kubelet
netdata time=2024-03-15T11:12:05.002Z level=error msg="Get \"https://localhost:10250/metrics\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)" plugin=go.d collector=k8s_kubelet job=k8s_kubelet
netdata time=2024-03-15T11:12:11.004Z level=error msg="context deadline exceeded (Client.Timeout or context cancellation while reading body)" plugin=go.d collector=k8s_kubelet job=k8s_kubelet
netdata time=2024-03-15T11:12:20.002Z level=error msg="Get \"https://localhost:10250/metrics\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)" plugin=go.d collector=k8s_kubelet job=k8s_kubelet
netdata time=2024-03-15T11:12:22.002Z level=error msg="Get \"https://localhost:10250/metrics\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)" plugin=go.d collector=k8s_kubelet job=k8s_kubelet
netdata time=2024-03-15T11:12:26.002Z level=error msg="Get \"https://localhost:10250/metrics\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)" plugin=go.d collector=k8s_kubelet job=k8s_kubelet
netdata time=2024-03-15T11:12:42.002Z level=error msg="Get \"https://localhost:10250/metrics\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)" plugin=go.d collector=k8s_kubelet job=k8s_kubelet
netdata time=2024-03-15T11:12:50.002Z level=error msg="Get \"https://localhost:10250/metrics\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)" plugin=go.d collector=k8s_kubelet job=k8s_kubelet
netdata time=2024-03-15T11:12:52.003Z level=error msg="context deadline exceeded (Client.Timeout or context cancellation while reading body)" plugin=go.d collector=k8s_kubelet job=k8s_kubelet
netdata time=2024-03-15T11:12:53.004Z level=error msg="context deadline exceeded (Client.Timeout or context cancellation while reading body)" plugin=go.d collector=k8s_kubelet job=k8s_kubelet
netdata time=2024-03-15T11:18:23.518+00:00 comm=netdata source=daemon level=error tid=2875749 thread=P[cgroups] msg="child pid 112911 exited with code 3."
netdata time=2024-03-15T11:19:34.001Z level=error msg="context deadline exceeded (Client.Timeout or context cancellation while reading body)" plugin=go.d collector=k8s_kubelet job=k8s_kubelet

on the k3S i have a 10250 port open with /metrics

Expected behavior

no timeout error or possibility to disable child kubelet agent

Steps to reproduce

  1. helm install classic (only ingress) wit argo app
    like ===> destination:
    server: https://kubernetes.default.svc
    namespace: kube-monitoring
    project: monitoring
    source:
    repoURL: https://netdata.github.io/helmchart/
    targetRevision: 3.7.84
    chart: netdata
    helm:
    values: |
    ingress:
    enabled: true
    annotations:
    cert-manager.io/cluster-issuer: letsencrypt
    kubernetes.io/ingress.class: traefik
    path: /
    pathType: Prefix
    hosts:
    - netdata.home.oursain.net

       spec:
         ingressClassName: traefik
       tls:
         - secretName: netdata-tls
           hosts:
             - netdata.home.oursain.net
             -
    
  2. looks at logs for different pod

...

Installation method

helmchart (kubernetes)

System info

listing node 

NAME           STATUS   ROLES                       AGE     VERSION        INTERNAL-IP    EXTERNAL-IP   OS-IMAGE                         KERNEL-VERSION        CONTAINER-RUNTIME
amd            Ready    <none>                      4d12h   v1.28.7+k3s1   192.168.1.27   <none>        Ubuntu 22.04.4 LTS               5.15.0-100-generic    containerd://1.7.11-k3s2
master         Ready    control-plane,etcd,master   4d15h   v1.28.7+k3s1   192.168.1.21   <none>        Debian GNU/Linux 12 (bookworm)   6.1.0-rpi8-rpi-2712   containerd://1.7.11-k3s2
nvidia         Ready    <none>                      4d15h   v1.28.7+k3s1   192.168.1.80   <none>        Ubuntu 23.10                     6.5.0-25-generic      containerd://1.7.11-k3s2
raspberrypi3   Ready    <none>                      2d3h    v1.28.7+k3s1   192.168.1.23   <none>        Debian GNU/Linux 12 (bookworm)   6.6.20+rpt-rpi-2712   containerd://1.7.11-k3s2
raspberrypi4   Ready    control-plane,etcd,master   4d15h   v1.28.7+k3s1   192.168.1.24   <none>        Debian GNU/Linux 12 (bookworm)   6.1.0-rpi8-rpi-2712   containerd://1.7.11-k3s2
raspberrypi5   Ready    control-plane,etcd,master   4d15h   v1.28.7+k3s1   192.168.1.25   <none>        Debian GNU/Linux 12 (bookworm)   6.1.0-rpi8-rpi-2712   containerd://1.7.11-k3s2

uname -a; grep -HvE "^#|URL" /etc/*release
Linux master 6.1.0-rpi8-rpi-2712 #1 SMP PREEMPT Debian 1:6.1.73-1+rpt1 (2024-01-25) aarch64 GNU/Linux
/etc/os-release:PRETTY_NAME="Debian GNU/Linux 12 (bookworm)"
/etc/os-release:NAME="Debian GNU/Linux"
/etc/os-release:VERSION_ID="12"
/etc/os-release:VERSION="12 (bookworm)"
/etc/os-release:VERSION_CODENAME=bookworm
/etc/os-release:ID=debian
cluster@master:~ $

Netdata build info

netdata -W buildinfo
Packaging:
    Netdata Version ____________________________________________ : v1.44.3
    Installation Type __________________________________________ : oci
    Package Architecture _______________________________________ : aarch64
    Package Distro _____________________________________________ : unknown
    Configure Options __________________________________________ :  '--prefix=/usr' '--sysconfdir=/etc' '--localstatedir=/var' '--libexecdir=/usr/libexec' '--libdir=/usr/lib' '--with-math' '--with-user=netdata' '--without-bundled-protobuf' '--disable-ebpf' '--disable-dependency-tracking' '--enable-lto' 'CFLAGS=-ffunction-sections -fdata-sections -O2 -funroll-loops -pipe -DFLB_HAVE_INOTIFY' 'LDFLAGS=-Wl,--gc-sections'
Default Directories:
    User Configurations ________________________________________ : /etc/netdata
    Stock Configurations _______________________________________ : /usr/lib/netdata/conf.d
    Ephemeral Databases (metrics data, metadata) _______________ : /var/cache/netdata
    Permanent Databases ________________________________________ : /var/lib/netdata
    Plugins ____________________________________________________ : /usr/libexec/netdata/plugins.d
    Static Web Files ___________________________________________ : /usr/share/netdata/web
    Log Files __________________________________________________ : /var/log/netdata
    Lock Files _________________________________________________ : /var/lib/netdata/lock
    Home _______________________________________________________ : /var/lib/netdata
Operating System:
    Kernel _____________________________________________________ : Linux
    Kernel Version _____________________________________________ : 6.1.0-rpi8-rpi-2712
    Operating System ___________________________________________ : Debian GNU/Linux
    Operating System ID ________________________________________ : debian
    Operating System ID Like ___________________________________ : unknown
    Operating System Version ___________________________________ : 12 (bookworm)
    Operating System Version ID ________________________________ : 12
    Detection __________________________________________________ : /host/etc/os-release
Hardware:
    CPU Cores __________________________________________________ : 4
    CPU Frequency ______________________________________________ : 2400000000
    RAM Bytes __________________________________________________ : 8444936192
    Disk Capacity ______________________________________________ : 153635258368
    CPU Architecture ___________________________________________ : aarch64
    Virtualization Technology __________________________________ : unknown
    Virtualization Detection ___________________________________ : none
Container:
    Container __________________________________________________ : container
    Container Detection ________________________________________ : kubernetes
    Container Orchestrator _____________________________________ : kubernetes
    Container Operating System _________________________________ : Debian GNU/Linux
    Container Operating System ID ______________________________ : debian
    Container Operating System ID Like _________________________ : unknown
    Container Operating System Version _________________________ : 12 (bookworm)
    Container Operating System Version ID ______________________ : 12
    Container Operating System Detection _______________________ : /etc/os-release
Features:
    Built For __________________________________________________ : Linux
    Netdata Cloud ______________________________________________ : YES
    Health (trigger alerts and send notifications) _____________ : YES
    Streaming (stream metrics to parent Netdata servers) _______ : YES
    Back-filling (of higher database tiers) ____________________ : YES
    Replication (fill the gaps of parent Netdata servers) ______ : YES
    Streaming and Replication Compression ______________________ : YES (zstd lz4 gzip)
    Contexts (index all active and archived metrics) ___________ : YES
    Tiering (multiple dbs with different metrics resolution) ___ : YES (5)
    Machine Learning ___________________________________________ : YES
Database Engines:
    dbengine ___________________________________________________ : YES
    alloc ______________________________________________________ : YES
    ram ________________________________________________________ : YES
    map ________________________________________________________ : YES
    save _______________________________________________________ : YES
    none _______________________________________________________ : YES
Connectivity Capabilities:
    ACLK (Agent-Cloud Link: MQTT over WebSockets over TLS) _____ : YES
    static (Netdata internal web server) _______________________ : YES
    h2o (web server) ___________________________________________ : YES
    WebRTC (experimental) ______________________________________ : NO
    Native HTTPS (TLS Support) _________________________________ : YES
    TLS Host Verification ______________________________________ : YES
Libraries:
    LZ4 (extremely fast lossless compression algorithm) ________ : YES
    ZSTD (fast, lossless compression algorithm) ________________ : YES
    zlib (lossless data-compression library) ___________________ : YES
    Judy (high-performance dynamic arrays and hashtables) ______ : YES (bundled)
    dlib (robust machine learning toolkit) _____________________ : YES (bundled)
    protobuf (platform-neutral data serialization protocol) ____ : YES (system)
    OpenSSL (cryptography) _____________________________________ : YES
    libdatachannel (stand-alone WebRTC data channels) __________ : NO
    JSON-C (lightweight JSON manipulation) _____________________ : YES
    libcap (Linux capabilities system operations) ______________ : NO
    libcrypto (cryptographic functions) ________________________ : YES
    libm (mathematical functions) ______________________________ : YES
    jemalloc ___________________________________________________ : NO
    TCMalloc ___________________________________________________ : NO
Plugins:
    apps (monitor processes) ___________________________________ : YES
    cgroups (monitor containers and VMs) _______________________ : YES
    cgroup-network (associate interfaces to CGROUPS) ___________ : YES
    proc (monitor Linux systems) _______________________________ : YES
    tc (monitor Linux network QoS) _____________________________ : YES
    diskspace (monitor Linux mount points) _____________________ : YES
    freebsd (monitor FreeBSD systems) __________________________ : NO
    macos (monitor MacOS systems) ______________________________ : NO
    statsd (collect custom application metrics) ________________ : YES
    timex (check system clock synchronization) _________________ : YES
    idlejitter (check system latency and jitter) _______________ : YES
    bash (support shell data collection jobs - charts.d) _______ : YES
    debugfs (kernel debugging metrics) _________________________ : YES
    cups (monitor printers and print jobs) _____________________ : NO
    ebpf (monitor system calls) ________________________________ : NO
    freeipmi (monitor enterprise server H/W) ___________________ : YES
    nfacct (gather netfilter accounting) _______________________ : NO
    perf (collect kernel performance events) ___________________ : YES
    slabinfo (monitor kernel object caching) ___________________ : YES
    Xen ________________________________________________________ : NO
    Xen VBD Error Tracking _____________________________________ : NO
    Logs Management ____________________________________________ : YES
Exporters:
    AWS Kinesis ________________________________________________ : NO
    GCP PubSub _________________________________________________ : NO
    MongoDB ____________________________________________________ : YES
    Prometheus (OpenMetrics) Exporter __________________________ : YES
    Prometheus Remote Write ____________________________________ : YES
    Graphite ___________________________________________________ : YES
    Graphite HTTP / HTTPS ______________________________________ : YES
    JSON _______________________________________________________ : YES
    JSON HTTP / HTTPS __________________________________________ : YES
    OpenTSDB ___________________________________________________ : YES
    OpenTSDB HTTP / HTTPS ______________________________________ : YES
    All Metrics API ____________________________________________ : YES
    Shell (use metrics in shell scripts) _______________________ : YES
Debug/Developer Features:
    Trace All Netdata Allocations (with charts) ________________ : NO
    Developer Mode (more runtime checks, slower) _______________ : NO

Additional info

No response

@didlawowo didlawowo added bug needs triage Issues which need to be manually labelled labels Mar 15, 2024
@ilyam8 ilyam8 added question and removed bug needs triage Issues which need to be manually labelled labels Mar 15, 2024
@ilyam8
Copy link
Member

ilyam8 commented Mar 15, 2024

The collector is expected to log an error if it fails to collect metrics. You can:

  • change data collection frequency and increase the timeout (both defaults to 1 second).
  • disable kubelet collector (change this to false).

@ilyam8 ilyam8 closed this as completed Mar 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants