Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Loki-distributed] query error open /var/loki/chunks/ #1111

Open
danielserrao opened this issue Mar 18, 2022 · 14 comments
Open

[Loki-distributed] query error open /var/loki/chunks/ #1111

danielserrao opened this issue Mar 18, 2022 · 14 comments

Comments

@danielserrao
Copy link

danielserrao commented Mar 18, 2022

I have Grafana with the Loki datasource pointing to the loki querier-frontend but I get the following error when making queries:

Query error
open /var/loki/chunks/ZmFrZS9kOGU4OGYwOTg3ZTM0NWUyOjE3ZjllMTk0NmE4OjE3ZjllMTk1NmVkOmMwMWFiYmNm: no such file or directory

Sometimes it is working and then it gets the same error for some reason that is not clear to me.

On the logs of the querier-frontend pod I can see:

caller=logging.go:72 traceID=5c8361c04594c7a2 orgID=fake msg="GET /loki/api/v1/query_range?direction=BACKWARD&limit=1000&query=%7Bjob%3D%22fbit_k8s%22%7D&start=1647619419284000000&end=1647630219285000000&step=5 (500) 53.767877ms Response: \"open /var/loki/chunks/ZmFrZS9kOGU4OGYwOTg3ZTM0NWUyOjE3ZjllMTk0NmE4OjE3ZjllMTk1NmVkOmMwMWFiYmNm: no such file or directory\\n\" ws: false; Accept: application/json, text/plain, */*; Accept-Encoding: gzip, deflate, br; Accept-Language: en-GB,en;q=0.9,en-US;q=0.8; Sec-Ch-Ua: \" Not A;Brand\";v=\"99\", \"Chromium\";v=\"99\", \"Microsoft Edge\";v=\"99\"; Sec-Ch-Ua-Mobile: ?0; Sec-Ch-Ua-Platform: \"Windows\"; Sec-Fetch-Dest: empty; Sec-Fetch-Mode: cors; Sec-Fetch-Site: same-origin; User-Agent: Grafana/8.3.5; X-Forwarded-For: 127.0.0.1, 127.0.0.1; X-Grafana-Org-Id: 1; "

When doing "helm template", the K8s manifest (which is applied) is the following:

test.txt

I already tried multiple types of configurations, but I always get this annoying error.

Some help would be very appreciated.

@danielserrao danielserrao changed the title [Loki-distributed] [Loki-distributed] query error open /var/loki/chunks/ Mar 18, 2022
@danielserrao
Copy link
Author

This started working after using s3 storage with the following loki-distributed config:

loki:
  config: |
    auth_enabled: false
    chunk_store_config:
      max_look_back_period: 0s
    compactor:
      shared_store: s3
    distributor:
      ring:
        kvstore:
          store: memberlist
    frontend:
      compress_responses: true
      log_queries_longer_than: 5s
      tail_proxy_url: http://loki-distributed-querier:3100
    frontend_worker:
      frontend_address: loki-distributed-query-frontend:9095
    ingester:
      chunk_block_size: 262144
      chunk_encoding: snappy
      chunk_idle_period: 5m
      chunk_retain_period: 30s
      lifecycler:
        ring:
          kvstore:
            store: memberlist
          replication_factor: 1
      max_chunk_age: 5m
      max_transfer_retries: 0
      wal:
        dir: /var/loki/wal
    limits_config:
      enforce_metric_name: false
      max_cache_freshness_per_query: 10m
      reject_old_samples: true
      reject_old_samples_max_age: 168h
    memberlist:
      join_members:
      - loki-distributed-memberlist
    query_range:
      align_queries_with_step: true
      cache_results: true
      max_retries: 5
      results_cache:
        cache:
          enable_fifocache: true
          fifocache:
            max_size_items: 1024
            validity: 24h
      split_queries_by_interval: 15m
    ruler:
      alertmanager_url: https://alertmanager.xx
      external_url: https://alertmanager.xx
      ring:
        kvstore:
          store: memberlist
      rule_path: /tmp/loki/scratch
      storage:
        local:
          directory: /etc/loki/rules
        type: local
    schema_config:
      configs:
      - from: "2020-05-15"
        index:
          period: 24h
          prefix: index_
        object_store: s3
        schema: v11
        store: boltdb-shipper
    server:
      http_listen_port: 3100
    storage_config:
      boltdb_shipper:
        active_index_directory: /var/loki/index
        cache_location: /var/loki/cache
        cache_ttl: 168h
        index_gateway_client:
          server_address: dns:///loki-distributed-index-gateway:9095
        shared_store: s3
      aws:
        bucketnames: <bucket-name>
        s3: s3://<region>
    table_manager:
      retention_deletes_enabled: false
      retention_period: 0s

@rafaribe
Copy link

I'm experiencing the same thing with a very similar config to yours but using azure blob storage.
If I query anything over 1h i get this annoying message.

@aberenshtein
Copy link

aberenshtein commented Apr 26, 2022

Thanks @danielserrao
changing all filesystem references to s3 worked for me

@kumarganesh2814
Copy link

Hi

I am getting same error, may I know config which you mentioned where needs to be updated. I have installed Loki as

image
Is there any configmap we can update?

Best Regards
Ganesh

@kfkawalec
Copy link

I have the same problem after restarting some components. Anyone have a solution on how to fix it?

error: open /grafana-loki/chunks/ZmFrZS8yMDU1NjdiNzY5ZWVhZmJkOjE4MGU0ZTE3ZDJkOjE4MGU1NGZhYjg2OmRkMDQ4NWQy: no such file or directory 

But the file exists and all perminision is OK

$ more /grafana-loki/chunks/ZmFrZS8yMDU1NjdiNzY5ZWVhZmJkOjE4MGU0ZTE3ZDJkOjE4MGU1NGZhYjg2OmRkMDQ4NWQy
rke2-ingress-nginx-controller","filename":"/var/log/pods/kube-system_rk
--More--(1%)

@liuxuzxx
Copy link

When I use loki-simple-scalabel, and I use nfs of storageClass,
When I select the time range is 5 minutes ,It is ok, but when I select 15 minutes or 1hour time range,the error occured!

open /var/loki/chunks/fake/755005aa5e414340/MTgxMTNjOGM5MGI6MTgxMTQzNmE2NTI6M2RkYjQzYmQ=: no such file or directory

when I enter the write pod ,the file is exists!

This error occurs sometimes and sometimes not

@tobifroe
Copy link

tobifroe commented May 30, 2022

This is mentioned in the chart README I think:

NOTE: In its default configuration, the chart uses boltdb-shipper and filesystem as storage. The reason for this is that the chart can be validated and installed in a CI pipeline. However, this setup is not fully functional. Querying will not be possible (or limited to the ingesters' in-memory caches) because that would otherwise require shared storage between ingesters and queriers which the chart does not support and would require a volume that supports ReadWriteMany access mode anyways. The recommendation is to use object storage, such as S3, GCS, MinIO, etc., or one of the other options documented at https://grafana.com/docs/loki/latest/storage/.

Using filesystem storage in the multi pod setup would require multiple pods to access the same volume, so data is only queryable as long as it's cached in memory.
I got around this issue by installing the single binary Loki chart

@andretadeu
Copy link

andretadeu commented Aug 29, 2022

I could get the things working by configuring the volumes:

loki-distributed:
  ingester:
    extraVolumes:
      - name: loki-chunks
        hostPath:
          path: "/var/loki/chunks"
          type: Directory
    extraVolumeMounts:
      - name: loki-chunks
        mountPath: "/var/loki/chunks"
  querier:
    extraVolumes:
      - name: loki-chunks
        hostPath:
          path: "/var/loki/chunks"
          type: Directory
    extraVolumeMounts:
      - name: loki-chunks
        mountPath: "/var/loki/chunks"

and I created this folder with permissions to the pods to write on them. Of course, this settings are for local directories, not for volumes on GCS or S3, for example.

@matthewei
Copy link

@aberenshtein hi, have you solved this issue? i meet the same issue. I don't use object storage and just use filesystem(lvm-localpv)

@aberenshtein
Copy link

yes, but I see that the references I put for the value files are outdated.
I guess they were updated in later versions

@ak2766
Copy link

ak2766 commented Nov 6, 2022

I'm getting this error when there's high traffic in the cluster. I managed to duplicate by running the benchmark tool - wrk. It seems that when promtail is unable to send logs to loki due to high network traffic in my cluster, then querying loki datasource in grafana results in this error if the query range includes the time period of high traffic.

Any solution for this?

UPDATE: I'm running the following kube-prometheus-stack components in the cluster:

$ helm -n monitoring list
NAME            NAMESPACE       REVISION        UPDATED                                         STATUS          CHART                           APP VERSION
loki            monitoring      1               2022-09-27 14:48:32.011792243 +1000 AEST        deployed        loki-distributed-0.58.0         2.6.1
prom            monitoring      1               2022-09-27 14:47:26.820679248 +1000 AEST        deployed        kube-prometheus-stack-40.1.2    0.59.1
promtail        monitoring      1               2022-09-27 14:48:23.583706894 +1000 AEST        deployed        promtail-6.4.0                  2.6.1

@jdgomeza
Copy link

For me, the problem was solved by removing the default configuration storage_config/filesystem that the helm template generates after applying my values.yaml file. I am using the helm chart loki-distributed v0.63.1.

here is the snippet that removes the extra filesystem config property

# values.yaml
loki:
  annotations: {}

  ...  

  storageConfig:
    boltdb_shipper:
      shared_store: s3
    aws:
      s3: s3://${cluster_region}
      bucketnames: ${bucket_name}
    filesystem: null

Notice the latest filesystem: null. That line removes the reference to directory: /var/loki/chunks that was confusing the querier

# generated configMap
apiVersion: v1
data:
  config.yaml: |
    auth_enabled: false

...

    storage_config:
      aws:
        bucketnames: bucket-for-logs
        s3: s3://${region}
      boltdb_shipper:
        active_index_directory: /var/loki/index
        cache_location: /var/loki/cache
        cache_ttl: 168h
        shared_store: s3
-      filesystem:
-       directory: /var/loki/chunks

@adapasuresh
Copy link

I have distributed micro services working in one cluster, but in production facing issues after couple of weeks. I added pvc to grafana and restarted the same and now I am not able to get labels in grafana UI with "failed to call resource "

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests