Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Loki is running, but connection refused and http 404 page not found. #11893

Open
dewstyh opened this issue Feb 7, 2024 · 11 comments
Open

Loki is running, but connection refused and http 404 page not found. #11893

dewstyh opened this issue Feb 7, 2024 · 11 comments

Comments

@dewstyh
Copy link

dewstyh commented Feb 7, 2024

Describe the bug
Loki pod is running, but cannot access loki service from Http://loki:3100 or even at http://localhost:3100.
Promtail says "level=warn ts=2024-02-07T20:22:28.048083032Z caller=client.go:419 component=client host=loki:3100 msg="error sending batch, will retry" status=-1 tenant= error="Post "http://loki:3100/loki/api/v1/push\": dial tcp 172.20.201.59:3100: connect: connection refused"
Grafana says when testing data source: "Unable to connect with Loki. Please check the server logs for more details."

To Reproduce
Steps to reproduce the behavior:

  1. Installed loki-stack helm chart version "2.10.1", which installs grafana/loki: v2.6.1 and promtail- v2.9.3
  2. installed through terraform, extra config settings to store logs in s3 bucket which is succesfully happening time to time.
  3. linked to kube-prom-stack grafana chart version - 56.2.2 as a loki data source.

Expected behavior
everything creates like service accounts for loki-promtail and loki pod is also runnig and sending logs to s3 bucket using compactor. but loki service is not able access says "connection refused". if you exec into grafana pod and say "curl http://loki:3100" says 404 error not found.

Environment:

  • Infrastructure: aws eks
  • Deployment tool: helm

Screenshots, Promtail config, or terminal output
thagoni@iB1033 MINGW64 ~/Desktop/repositories/BaseInfrastructure (RII/SPIKE/DEVOPS-161/ServiceMeshSetUp)
$ kubectl describe pod loki-0 -n monitoring
Name: loki-0
Namespace: monitoring
Priority: 0
Service Account: loki
Node: ip-10-60-2-49.eu-west-1.compute.internal/10.60.2.49
Start Time: Wed, 07 Feb 2024 15:22:16 -0500
Labels: app=loki
apps.kubernetes.io/pod-index=0
controller-revision-hash=loki-7586d5599f
name=loki
release=loki
statefulset.kubernetes.io/pod-name=loki-0
Annotations: checksum/config: 2a120e4d0d88f524b9589fe1d544395d6be51a0a02e568da2a4c6f766cd20173
prometheus.io/port: http-metrics
prometheus.io/scrape: true
Status: Running
IP: 10.60.2.67
IPs:
IP: 10.60.2.67
Controlled By: StatefulSet/loki
Containers:
loki:
Container ID: containerd://ecca141212172e1d62f477db666cc9abd6229240bfa63416ec602daf69caf450
Image: grafana/loki:2.6.1
Image ID: docker.io/grafana/loki@sha256:1ee60f980950b00e505bd564b40f720132a0653b110e993043bb5940673d060a
Ports: 3100/TCP, 9095/TCP, 7946/TCP
Host Ports: 0/TCP, 0/TCP, 0/TCP
Args:
-config.file=/etc/loki/loki.yaml
State: Running
Started: Wed, 07 Feb 2024 15:22:17 -0500
Ready: True
Restart Count: 0
Liveness: http-get http://:http-metrics/ready delay=45s timeout=1s period=10s #success=1 #failure=3
Readiness: http-get http://:http-metrics/ready delay=45s timeout=1s period=10s #success=1 #failure=3
Environment:
AWS_STS_REGIONAL_ENDPOINTS: regional
AWS_DEFAULT_REGION: eu-west-1
AWS_REGION: eu-west-1
AWS_ROLE_ARN: arn:aws:iam::207997242047:role/loki-eks-role-staginginfra
AWS_WEB_IDENTITY_TOKEN_FILE: /var/run/secrets/eks.amazonaws.com/serviceaccount/token
Mounts:
/data from storage (rw)
/etc/loki from config (rw)
/tmp from tmp (rw)
/var/run/secrets/eks.amazonaws.com/serviceaccount from aws-iam-token (ro)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-vd5kx (ro)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
aws-iam-token:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 86400
tmp:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit:
config:
Type: Secret (a volume populated by a Secret)
SecretName: loki
Optional: false
storage:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit:
kube-api-access-vd5kx:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional:
DownwardAPI: true
QoS Class: BestEffort
Node-Selectors:
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message


Normal Scheduled 17m default-scheduler Successfully assigned monitoring/loki-0 to ip-10-60-2-49.eu-west-1.compute.internal
Normal Pulled 17m kubelet Container image "grafana/loki:2.6.1" already present on machine
Normal Created 17m kubelet Created container loki
Normal Started 17m kubelet Started container loki
Warning Unhealthy 16m (x2 over 17m) kubelet Liveness probe failed: HTTP probe failed with statuscode: 503
Warning Unhealthy 16m (x2 over 17m) kubelet Readiness probe failed: HTTP probe failed with statuscode: 503

kubectl logs output

kubectl logs loki-0 -n monitoring
level=info ts=2024-02-07T20:22:17.108922813Z caller=main.go:103 msg="Starting Loki" version="(version=2.6.1, branch=HEAD, revision=6bd05c9a4)"
level=info ts=2024-02-07T20:22:17.109036425Z caller=modules.go:736 msg="RulerStorage is not configured in single binary mode and will not be started."
level=info ts=2024-02-07T20:22:17.110074064Z caller=server.go:288 http=[::]:3100 grpc=[::]:9095 msg="server listening on addresses"
level=warn ts=2024-02-07T20:22:17.118120348Z caller=experimental.go:20 msg="experimental feature in use" feature="In-memory (FIFO) cache - chunksfifocache"
level=info ts=2024-02-07T20:22:17.1197379Z caller=table_manager.go:252 msg="query readiness setup completed" duration=2.122µs distinct_users_len=0
level=info ts=2024-02-07T20:22:17.119780323Z caller=shipper.go:124 msg="starting index shipper in RW mode"
level=info ts=2024-02-07T20:22:17.120352007Z caller=shipper_index_client.go:79 msg="starting boltdb shipper in RW mode"
level=info ts=2024-02-07T20:22:17.121468091Z caller=table_manager.go:134 msg="uploading tables"
level=info ts=2024-02-07T20:22:17.121690467Z caller=table_manager.go:167 msg="handing over indexes to shipper"
level=info ts=2024-02-07T20:22:17.123019038Z caller=modules.go:761 msg="RulerStorage is nil. Not starting the ruler."
level=info ts=2024-02-07T20:22:17.125720801Z caller=worker.go:112 msg="Starting querier worker using query-scheduler and scheduler ring for addresses"
level=info ts=2024-02-07T20:22:17.128389809Z caller=module_service.go:82 msg=initialising module=server
level=info ts=2024-02-07T20:22:17.128550405Z caller=module_service.go:82 msg=initialising module=query-frontend-tripperware
level=info ts=2024-02-07T20:22:17.128576855Z caller=module_service.go:82 msg=initialising module=memberlist-kv
level=info ts=2024-02-07T20:22:17.128630839Z caller=module_service.go:82 msg=initialising module=store
level=info ts=2024-02-07T20:22:17.128664639Z caller=module_service.go:82 msg=initialising module=ring
level=info ts=2024-02-07T20:22:17.12876455Z caller=ring.go:263 msg="ring doesn't exist in KV store yet"
level=info ts=2024-02-07T20:22:17.128843104Z caller=module_service.go:82 msg=initialising module=usage-report
level=info ts=2024-02-07T20:22:17.129074246Z caller=module_service.go:82 msg=initialising module=distributor
level=info ts=2024-02-07T20:22:17.129146378Z caller=module_service.go:82 msg=initialising module=compactor
level=info ts=2024-02-07T20:22:17.129200132Z caller=ring.go:263 msg="ring doesn't exist in KV store yet"
level=info ts=2024-02-07T20:22:17.129238832Z caller=module_service.go:82 msg=initialising module=ingester-querier
level=info ts=2024-02-07T20:22:17.129259942Z caller=module_service.go:82 msg=initialising module=ingester
level=info ts=2024-02-07T20:22:17.129294458Z caller=ingester.go:401 msg="recovering from checkpoint"
level=info ts=2024-02-07T20:22:17.129397012Z caller=recovery.go:39 msg="no checkpoint found, treating as no-op"
level=info ts=2024-02-07T20:22:17.129446276Z caller=module_service.go:82 msg=initialising module=query-scheduler
level=info ts=2024-02-07T20:22:17.129505207Z caller=ring.go:263 msg="ring doesn't exist in KV store yet"
level=info ts=2024-02-07T20:22:17.129606292Z caller=lifecycler.go:547 msg="not loading tokens from file, tokens file path is empty"
level=info ts=2024-02-07T20:22:17.129636385Z caller=lifecycler.go:576 msg="instance not found in ring, adding with no tokens" ring=distributor
level=info ts=2024-02-07T20:22:17.129795193Z caller=lifecycler.go:416 msg="auto-joining cluster after timeout" ring=distributor
level=info ts=2024-02-07T20:22:17.129910904Z caller=basic_lifecycler.go:261 msg="instance not found in the ring" instance=loki-0 ring=compactor
level=info ts=2024-02-07T20:22:17.129948644Z caller=basic_lifecycler_delegates.go:63 msg="not loading tokens from file, tokens file path is empty"
level=info ts=2024-02-07T20:22:17.130091377Z caller=basic_lifecycler.go:261 msg="instance not found in the ring" instance=loki-0 ring=scheduler
level=info ts=2024-02-07T20:22:17.130112831Z caller=basic_lifecycler_delegates.go:63 msg="not loading tokens from file, tokens file path is empty"
level=info ts=2024-02-07T20:22:17.13022309Z caller=compactor.go:307 msg="waiting until compactor is JOINING in the ring"
level=info ts=2024-02-07T20:22:17.130242532Z caller=compactor.go:311 msg="compactor is JOINING in the ring"
level=info ts=2024-02-07T20:22:17.130278111Z caller=ingester.go:417 msg="recovered WAL checkpoint recovery finished" elapsed=1.004053ms errors=false
level=info ts=2024-02-07T20:22:17.130296029Z caller=ingester.go:423 msg="recovering from WAL"
level=info ts=2024-02-07T20:22:17.131879088Z caller=scheduler.go:617 msg="waiting until scheduler is JOINING in the ring"
level=info ts=2024-02-07T20:22:17.132145104Z caller=scheduler.go:621 msg="scheduler is JOINING in the ring"
level=info ts=2024-02-07T20:22:17.132682363Z caller=ingester.go:439 msg="WAL segment recovery finished" elapsed=3.407953ms errors=false
level=info ts=2024-02-07T20:22:17.132711332Z caller=ingester.go:387 msg="closing recoverer"
level=info ts=2024-02-07T20:22:17.132735725Z caller=ingester.go:395 msg="WAL recovery finished" time=3.450095ms
level=info ts=2024-02-07T20:22:17.133166304Z caller=lifecycler.go:547 msg="not loading tokens from file, tokens file path is empty"
level=info ts=2024-02-07T20:22:17.133207209Z caller=lifecycler.go:576 msg="instance not found in ring, adding with no tokens" ring=ingester
level=info ts=2024-02-07T20:22:17.133301923Z caller=lifecycler.go:416 msg="auto-joining cluster after timeout" ring=ingester
level=info ts=2024-02-07T20:22:17.133420954Z caller=wal.go:156 msg=started component=wal
ts=2024-02-07T20:22:17.13461987Z caller=memberlist_logger.go:74 level=warn msg="Failed to resolve loki-memberlist: lookup loki-memberlist on 172.20.0.10:53: no such host"
level=info ts=2024-02-07T20:22:18.131099981Z caller=compactor.go:321 msg="waiting until compactor is ACTIVE in the ring"
level=info ts=2024-02-07T20:22:18.131146045Z caller=compactor.go:325 msg="compactor is ACTIVE in the ring"
level=info ts=2024-02-07T20:22:18.13349622Z caller=scheduler.go:631 msg="waiting until scheduler is ACTIVE in the ring"
level=info ts=2024-02-07T20:22:18.133560786Z caller=scheduler.go:635 msg="scheduler is ACTIVE in the ring"
level=info ts=2024-02-07T20:22:18.133645279Z caller=module_service.go:82 msg=initialising module=querier
level=info ts=2024-02-07T20:22:18.133820767Z caller=module_service.go:82 msg=initialising module=query-frontend
level=info ts=2024-02-07T20:22:18.134015563Z caller=loki.go:374 msg="Loki started"
level=info ts=2024-02-07T20:22:18.342694497Z caller=memberlist_client.go:563 msg="joined memberlist cluster" reached_nodes=1
level=info ts=2024-02-07T20:22:21.134226057Z caller=scheduler.go:682 msg="this scheduler is in the ReplicationSet, will now accept requests."
level=info ts=2024-02-07T20:22:21.134251409Z caller=worker.go:209 msg="adding connection" addr=10.60.2.67:9095
level=info ts=2024-02-07T20:22:23.13125544Z caller=compactor.go:386 msg="this instance has been chosen to run the compactor, starting compactor"
level=info ts=2024-02-07T20:22:23.131429983Z caller=compactor.go:413 msg="waiting 10m0s for ring to stay stable and previous compactions to finish before starting compactor"
level=info ts=2024-02-07T20:22:28.135239576Z caller=frontend_scheduler_worker.go:101 msg="adding connection to scheduler" addr=10.60.2.67:9095
level=info ts=2024-02-07T20:23:17.121661169Z caller=table_manager.go:134 msg="uploading tables"
level=info ts=2024-02-07T20:23:17.122755904Z caller=table_manager.go:167 msg="handing over indexes to shipper"
level=info ts=2024-02-07T20:24:17.121947262Z caller=table_manager.go:134 msg="uploading tables"
level=info ts=2024-02-07T20:24:17.121939425Z caller=table_manager.go:167 msg="handing over indexes to shipper"

kubectl promtail logs output:

level=info ts=2024-02-07T20:22:27.84838984Z caller=tailer.go:143 component=tailer msg="tail routine: started" path=/var/log/pods/linkerd-jaeger_jaeger-595975bfcd-qbzkv_162056cc-eb63-4d65-8fb0-064946b63163/jaeger/0.log
ts=2024-02-07T20:22:27.848440664Z caller=log.go:168 level=info msg="Seeked /var/log/pods/linkerd-jaeger_jaeger-595975bfcd-qbzkv_162056cc-eb63-4d65-8fb0-064946b63163/linkerd-network-validator/0.log - &{Offset:1364 Whence:0}"
level=info ts=2024-02-07T20:22:27.848477263Z caller=tailer.go:143 component=tailer msg="tail routine: started" path=/var/log/pods/linkerd-jaeger_jaeger-595975bfcd-qbzkv_162056cc-eb63-4d65-8fb0-064946b63163/linkerd-network-validator/0.log
ts=2024-02-07T20:22:27.84851703Z caller=log.go:168 level=info msg="Seeked /var/log/pods/kube-system_secrets-provider-aws-secrets-store-csi-driver-provider-awslt87x_d83584d3-f50a-4264-999c-026c4ba9ca86/provider-aws-installer/0.log - &{Offset:2337 Whence:0}"
level=info ts=2024-02-07T20:22:27.848541876Z caller=tailer.go:143 component=tailer msg="tail routine: started" path=/var/log/pods/kube-system_secrets-provider-aws-secrets-store-csi-driver-provider-awslt87x_d83584d3-f50a-4264-999c-026c4ba9ca86/provider-aws-installer/0.log
ts=2024-02-07T20:22:27.848662227Z caller=log.go:168 level=info msg="Seeked /var/log/pods/cert-manager_cert-manager-webhook-58fd67545d-nqqrs_6cd8e2d3-8670-4c53-8d24-22d4c03ceaa0/cert-manager-webhook/0.log - &{Offset:5473 Whence:0}"
ts=2024-02-07T20:22:28.033669721Z caller=log.go:168 level=info msg="Re-opening truncated file /var/log/pods/staging-ibwave_assets-service-5d649b65d6-rnbkp_5e44eb18-00f1-4c0d-9eaf-38f4dacbdc89/assets-service/0.log ..."
ts=2024-02-07T20:22:28.033925306Z caller=log.go:168 level=info msg="Successfully reopened truncated /var/log/pods/staging-ibwave_assets-service-5d649b65d6-rnbkp_5e44eb18-00f1-4c0d-9eaf-38f4dacbdc89/assets-service/0.log"
level=warn ts=2024-02-07T20:22:28.048083032Z caller=client.go:419 component=client host=loki:3100 msg="error sending batch, will retry" status=-1 tenant= error="Post "http://loki:3100/loki/api/v1/push\": dial tcp 172.20.201.59:3100: connect: connection refused"
level=warn ts=2024-02-07T20:22:28.955735573Z caller=client.go:419 component=client host=loki:3100 msg="error sending batch, will retry" status=-1 tenant= error="Post "http://loki:3100/loki/api/v1/push\": dial tcp 172.20.201.59:3100: connect: connection refused"
level=warn ts=2024-02-07T20:22:30.019724012Z caller=client.go:419 component=client host=loki:3100 msg="error sending batch, will retry" status=-1 tenant= error="Post "http://loki:3100/loki/api/v1/push\": dial tcp 172.20.201.59:3100: connect: connection refused"
level=warn ts=2024-02-07T20:22:32.780896289Z caller=client.go:419 component=client host=loki:3100 msg="error sending batch, will retry" status=-1 tenant= error="Post "http://loki:3100/loki/api/v1/push\": dial tcp 172.20.201.59:3100: connect: connection refused"
level=warn ts=2024-02-07T20:22:40.543119547Z caller=client.go:419 component=client host=loki:3100 msg="error sending batch, will retry" status=-1 tenant= error="Post "http://loki:3100/loki/api/v1/push\": dial tcp 172.20.201.59:3100: connect: connection refused"
level=warn ts=2024-02-07T20:22:53.307863125Z caller=client.go:419 component=client host=loki:3100 msg="error sending batch, will retry" status=-1 tenant= error="Post "http://loki:3100/loki/api/v1/push\": dial tcp 172.20.201.59:3100: connect: connection refused"
level=warn ts=2024-02-07T20:23:18.268755422Z caller=client.go:419 component=client host=loki:3100 msg="error sending batch, will retry" status=-1 tenant= error="Post "http://loki:3100/loki/api/v1/push\": dial tcp 172.20.201.59:3100: connect: connection refused"

Grafana error for loki logs:

2024-02-07 15:42:51.216 logger=tsdb.loki endpoint=queryData pluginId=loki dsName=Loki dsUID=P8E80F9AEF21F6940 uname=admin fromAlert=false t=2024-02-07T20:42:51.215976098Z level=error msg="Error querying loki" error="Get "http://loki.monitoring.svc.cluster.local:3100/loki/api/v1/query_range?direction=backward&end=1707338579033000000&limit=10&query=%7Bnode_name%3D%22ip-10-60-2-49.eu-west-1.compute.internal%22%7D+%7C%3D+%60%60&start=1707316979033000000&step=21600000ms\": context canceled"
and says "Failed to load log volume for this query
parse error at line 1, col 101: syntax error: unexpected IDENTIFIER"

please help me it doesn't make sense why connection is not getting.

@withinboredom
Copy link

There's a typo in the chart and it installed loki 2.6.3 instead of 2.9.3. Set the following value in your helm chart (or use the correct image):

loki:
  image:
    tag: 2.9.3

@dewstyh
Copy link
Author

dewstyh commented Feb 8, 2024

i updated the image tag as you mentioned. still same problem pod is running but service says 404 page not found and pods also fails liveness and readiness tests with 503 errors in loki pod events.

@withinboredom
Copy link

It's possible I got the helm "address" or "path" wrong, so check that the pod running is the correct tag.

@FranciscoCross
Copy link

I am facing the same issue; I have already tried various versions of both Loki Chart and Promtail. This problem persists for me as well. In my case, I am using EKS 1.28. I applied this configuration to other clusters with different Kubernetes versions, and it worked. This leads me to believe that it might be an issue with AWS or some add-ons.

@hvspa
Copy link

hvspa commented Mar 15, 2024

hello,

i am also facing the same issue, as suggested i have changed the image directly in the pod to 2.9.3, pod got restarted but still the readiness and liveness probes are failing:

kubectl describe pod loki-0 | tail
  Normal   Pulled     2m9s                kubelet            Container image "grafana/loki:2.6.1" already present on machine
  Normal   Killing    63s                 kubelet            Container loki definition changed, will be restarted
  Normal   Pulling    61s                 kubelet            Pulling image "grafana/loki:2.9.3"
  Warning  Unhealthy  59s                 kubelet            Liveness probe failed: Get "http://10.124.1.121:3100/ready": dial tcp 10.124.1.121:3100: connect: connection refused
  Warning  Unhealthy  59s                 kubelet            Readiness probe failed: Get "http://10.124.1.121:3100/ready": dial tcp 10.124.1.121:3100: connect: connection refused
  Normal   Pulled     57s                 kubelet            Successfully pulled image "grafana/loki:2.9.3" in 4.605171397s (4.605252074s including waiting)
  Normal   Created    56s (x2 over 2m9s)  kubelet            Created container loki
  Normal   Started    56s (x2 over 2m8s)  kubelet            Started container loki
  Warning  Unhealthy  9s (x3 over 79s)    kubelet            Liveness probe failed: HTTP probe failed with statuscode: 503
  Warning  Unhealthy  9s (x3 over 79s)    kubelet            Readiness probe failed: HTTP probe failed with statuscode: 503

anyone got this sorted out?

tks

@zensqlmonitor
Copy link

Same issue for me
Warning Unhealthy 20m (x2 over 20m) kubelet Readiness probe failed: HTTP probe failed with statuscode: 503
Warning Unhealthy 20m (x2 over 20m) kubelet Liveness probe failed: HTTP probe failed with statuscode: 503

@hvspa
Copy link

hvspa commented Mar 20, 2024

Same issue for me
Warning Unhealthy 20m (x2 over 20m) kubelet Readiness probe failed: HTTP probe failed with statuscode: 503
Warning Unhealthy 20m (x2 over 20m) kubelet Liveness probe failed: HTTP probe failed with statuscode: 503

try this

helm upgrade --install loki grafana/loki-stack --set resources.requests.cpu=100m --set resources.requests.memory=128Mi -f gptprom.yaml

cat gptprom.yaml 
promtail:
  enabled: true
  config:
    clients:
    - url: http://{{ .Release.Name }}:3100/loki/api/v1/push
    logLevel: info
    serverPort: 3101
    snippets:
      pipelineStages:
      - cri: {}
      - match:
          selector: '{app="ingress-nginx", job="default/ingress-nginx"}'
          stages:
          - regex:
              expression: '^(?P<remote_addr>[\w\.]+) - (?P<remote_user>[^ ]*) \[(?P<time_local>.*)\] "(?P<method>[^ ]*) (?P<request>[^ ]*) (?P<protocol>[^ ]*)" (?P<status>[\d]+) (?P<body_bytes_sent>[\d]+) "(?P<http_referer>[^"]*)" "(?P<http_user_agent>[^"]*)" (?P<request_length>[\d]+) (?P<request_time>[^ ]+) \[(?P<proxy_upstream_name>.*)\] \[(?P<proxy_alternative_upstream_name>.*)\] (?P<upstream_addr>[\w\.]+:\d{1,5}) (?P<upstream_response_length>[\d]+) (?P<upstream_response_time>\d+(\.\d+)?) (?P<upstream_status>[\d]+)?'

              #log-format-upstream: '$remote_addr - $remote_user [$time_local] $request $status $body_bytes_sent $http_referer $http_user_agent $request_length $request_time [$proxy_upstream_name] [$proxy_alternative_upstream_name] $upstream_addr $upstream_response_length $upstream_response_time $upstream_status $req_id'
      - labels:
          remote_addr:
          remote_user:
          time_local:
          method:
          request:
          protocol:
          status:
          body_bytes_sent:
          http_referer:
          http_user_agent:
          request_length:
          request_time:
          proxy_upstream_name:
          proxy_alternative_upstream_name:
          upstream_addr:
          upstream_response_length:
          upstream_response_time:
          upstream_status:

i have added extra pipelinstages to parse nginx controller logs, you can skip it or use it. however later on migrated to json in nginx.conf to avoid the extra parsing

     pipelineStages:
      - cri: {}
      - match:
          selector: '{app="ingress-nginx", job="default/ingress-nginx"}'
          stages:
          - regex:
              expression: '^(?P<remote_addr>[\w\.]+) - (?P<remote_user>[^ ]*) \[(?P<time_local>.*)\] "(?P<method>[^ ]*) (?P<request>[^ ]*) (?P<protocol>[^ ]*)" (?P<status>[\d]+) (?P<body_bytes_sent>[\d]+) "(?P<http_referer>[^"]*)" "(?P<http_user_agent>[^"]*)" (?P<request_length>[\d]+) (?P<request_time>[^ ]+) \[(?P<proxy_upstream_name>.*)\] \[(?P<proxy_alternative_upstream_name>.*)\] (?P<upstream_addr>[\w\.]+:\d{1,5}) (?P<upstream_response_length>[\d]+) (?P<upstream_response_time>\d+(\.\d+)?) (?P<upstream_status>[\d]+)?'
kubectl describe pod loki-0 | tail
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:                      <none>
kubectl get pods  | grep loki
loki-0                                                            1/1     Running   0          19h
loki-promtail-4ft2c                                               1/1     Running   0          19h
loki-promtail-pb5lz                                               1/1     Running   0          19h

@zensqlmonitor
Copy link

Not better

@hvspa
Copy link

hvspa commented Mar 20, 2024

if you describe your loki pod, what is the version of image?

try getting loki values into file from helm chart by this command:

tar --extract --file=/root/.cache/helm/repository/loki-stack-2.10.2.tgz loki-stack/charts/loki/values.yaml -O > loki_values.yaml

then in the loki_values.yaml file modify the image to 2.9.3:

cat loki_values.yaml |  grep -i "image:" -A2 -m1
image:
  repository: grafana/loki
  tag: 2.9.3

then upgrade your loki install:

helm upgrade --install loki grafana/loki-stack --set resources.requests.cpu=100m --set resources.requests.memory=256Mi -f loki_values.yaml

see if this makes it work

ps. just to make sure your pods are getting the new config restart the pods:

kubectl rollout restart sts loki

@gawbul
Copy link

gawbul commented Mar 21, 2024

Setting image repository and tag in the values.yaml worked for me.

@gawbul
Copy link

gawbul commented Mar 22, 2024

I think the correct fix for this, however, is to update the Loki subchart in the Loki Stack chart to the latest version. It is still version 2.6.1 and should be using the latest release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants