Ruler not working with BoltDB Shipper #3076

rajatvig · 2020-12-14T14:45:52Z

Describe the bug
As detailed in the forum issue, Ruler cannot be v2.0.0 cannot be deployed out of the box when using boltdb-shipper backed by GCS storage.
The arguments for cache or index directory needs to be passed on and it requires this commit be made available so that the Ruler deployment does not attempt to write files.

To Reproduce
Steps to reproduce the behavior:

Deploy Loki using Tanka for v2.0.0 with ruler and boltdb-shipper enabled
Ruler pods go into a crash loop

Expected behavior
Ruler to start correctly

Environment:

Infrastructure: Kubernetes
Deployment tool: jsonnet

Screenshots, Promtail config, or terminal output

level=error ts=2020-12-10T23:15:12.966824939Z caller=log.go:149 msg="error running loki" err="mkdir : no such file or directory\nerror creating index client\ngithub.com/cortexproject/cortex/pkg/chunk/storage.NewStore\n\t/src/loki/vendor/github.com/cortexproject/cortex/pkg/chunk/storage/factory.go:176\ngithub.com/grafana/loki/pkg/loki.(*Loki).initStore\n\t/src/loki/pkg/loki/modules.go:287\ngithub.com/cortexproject/cortex/pkg/util/modules.(*Manager).initModule\n\t/src/loki/vendor/github.com/cortexproject/cortex/pkg/util/modules/modules.go:103\ngithub.com/cortexproject/cortex/pkg/util/modules.(*Manager).InitModuleServices\n\t/src/loki/vendor/github.com/cortexproject/cortex/pkg/util/modules/modules.go:75\ngithub.com/grafana/loki/pkg/loki.(*Loki).Run\n\t/src/loki/pkg/loki/loki.go:204\nmain.main\n\t/src/loki/cmd/loki/main.go:130\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:203\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1373\nerror initialising module: store\ngithub.com/cortexproject/cortex/pkg/util/modules.(*Manager).initModule\n\t/src/loki/vendor/github.com/cortexproject/cortex/pkg/util/modules/modules.go:105\ngithub.com/cortexproject/cortex/pkg/util/modules.(*Manager).InitModuleServices\n\t/src/loki/vendor/github.com/cortexproject/cortex/pkg/util/modules/modules.go:75\ngithub.com/grafana/loki/pkg/loki.(*Loki).Run\n\t/src/loki/pkg/loki/loki.go:204\nmain.main\n\t/src/loki/cmd/loki/main.go:130\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:203\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1373"                                                                                                                                                                                                                                                  ````

The text was updated successfully, but these errors were encountered:

stale · 2021-01-14T11:33:16Z

This issue has been automatically marked as stale because it has not had any activity in the past 30 days. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.

rajatvig · 2021-01-15T00:53:02Z

Using Ruler with 2.1.0 with BoltDB Shipper does not work either

stale · 2021-02-21T23:09:23Z

This issue has been automatically marked as stale because it has not had any activity in the past 30 days. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.

maxmeyer · 2021-02-25T08:16:44Z

I face the same problem. Is there any more information required to track down this issue?

cuzzo333 · 2021-03-08T20:05:20Z

Wanted to chime in that my team is facing this exact issue as well.

nirmalpathak · 2021-03-30T11:22:43Z

I am also facing a similar issue however in my case the ruler doesn't crash but it doesn't work as well. I see following in the loki logs.

loki | level=info ts=2021-03-27T05:33:18.370326332Z caller=module_service.go:91 msg="module stopped" module=ruler
loki | level=info ts=2021-03-27T05:34:39.543561777Z caller=module_service.go:59 msg=initialising module=ruler
loki | level=info ts=2021-03-27T05:34:39.543585011Z caller=ruler.go:403 msg="ruler up and running"

I am using Docker container to spin-up Loki along with Grafana, Promtail, etc. My config uses BoltDB shipper backed by AWS S3 as storage.

A detailed explanation of my issue has been posted on StackOverflow post.

owen-d · 2021-05-27T13:30:36Z

Hey, I think there are two issues here. The second one is due to misconfiguration of the directory structure. Loki expects a tenant in the rule path:

/tmp/loki/rules/<tenant id>/rules1.yaml
                           /rules2.yaml

The first issue is fixed in v2.2.0+ now that we've merged #3008

yashaswee · 2021-06-30T00:49:37Z

If we are using a default rule_path ([rule_path: <filename> | default = "/rules"]) does Loki still look for a tenant id?
I am using helm-charts for loki, this is my targetRevision 2.4.1 and this is what my ruler config looks like

`          ruler:
              storage:
                type: s3
                s3:
                  s3: s3://region/test-grafana-loki
              alertmanager_url: http://alertmanager.xxx.xx.xx.xx:9093
              notification_timeout: 1m
              ring:
                kvstore:
                  store: inmemory
              enable_api: true`

I do not see any additional folder created in my s3 bucket for alerts. These are the mounts on the statefulsets where we can see rules

    Mounts:
      /data from storage (rw)
      /etc/loki from config (rw)
      /rules from rules (rw)
      /tmp/scratch from scratch (rw)
  Volumes:
   scratch:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:
    SizeLimit:  <unset>
   rules:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      loki-alerting-rules
    Optional:  false
   config:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  loki
    Optional:    false
   storage:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:
    SizeLimit:  <unset>

The ruler is not up according to the logs. This is the last log line wrt ruler
level=info ts=2021-06-29T15:30:24.303411371Z caller=module_service.go:91 msg="module stopped" module=ruler

Do I need to mentioned rule_path with the tenant_id as well?

yashaswee · 2021-07-01T14:10:48Z

This is still not fixed. If you are using helm charts we can not use the tenant_id.

0-sv · 2021-08-25T09:30:23Z

I had the similar error message, this was my config file for some reason:

cat /etc/loki/loki.yaml
auth_enabled: true
chunk_store_config:
  max_look_back_period: 24h
compactor:
  shared_store: filesystem
  working_directory: /data/loki/boltdb-shipper-compactor
ingester:
  chunk_block_size: 262144
  chunk_idle_period: 3m
  chunk_retain_period: 1m
  lifecycler:
    ring:
      kvstore:
        store: inmemory
      replication_factor: 1
  max_transfer_retries: 0
limits_config:
  enforce_metric_name: false
  reject_old_samples: true
  reject_old_samples_max_age: 24h
schema_config:
  configs:
  - from: "2018-04-15"
    index:
      period: 24h
      prefix: index_
    object_store: filesystem
    schema: v9
    store: boltdb
  - from: "2021-05-13"
    index:
      period: 24h
      prefix: index_
    object_store: filesystem
    schema: v11
    store: boltdb-shipper
server:
  http_listen_port: 3100
storage_config:
  boltdb_shipper:
    active_index_directory: /data/loki/boltdb-shipper-active
    cache_location: /data/loki/boltdb-shipper-cache
    cache_ttl: 24h
    shared_store: filesystem
  filesystem:
    directory: /data/loki/chunks
table_manager:
  retention_deletes_enabled: true
  retention_period: 24h

Well I thought config[0] didn't make any sense. By deleting that array item the server was able to start. So I would suggest to check whether the binary is sound by running /usr/bin/loki. If that's ok then go ahead and add the config file. Then you can isolate the problem.

stale bot added the stale A stale issue or PR that will automatically be closed. label Jan 14, 2021

stale bot removed the stale A stale issue or PR that will automatically be closed. label Jan 15, 2021

stale bot added the stale A stale issue or PR that will automatically be closed. label Feb 21, 2021

stale bot removed the stale A stale issue or PR that will automatically be closed. label Feb 25, 2021

owen-d closed this as completed May 27, 2021

yashaswee mentioned this issue Jul 1, 2021

Loki ruler not working with s3 ruler storage #3922

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ruler not working with BoltDB Shipper #3076

Ruler not working with BoltDB Shipper #3076

rajatvig commented Dec 14, 2020

stale bot commented Jan 14, 2021

rajatvig commented Jan 15, 2021

stale bot commented Feb 21, 2021

maxmeyer commented Feb 25, 2021

cuzzo333 commented Mar 8, 2021

nirmalpathak commented Mar 30, 2021

owen-d commented May 27, 2021

yashaswee commented Jun 30, 2021 •

edited

Loading

yashaswee commented Jul 1, 2021

0-sv commented Aug 25, 2021 •

edited

Loading

Ruler not working with BoltDB Shipper #3076

Ruler not working with BoltDB Shipper #3076

Comments

rajatvig commented Dec 14, 2020

stale bot commented Jan 14, 2021

rajatvig commented Jan 15, 2021

stale bot commented Feb 21, 2021

maxmeyer commented Feb 25, 2021

cuzzo333 commented Mar 8, 2021

nirmalpathak commented Mar 30, 2021

owen-d commented May 27, 2021

yashaswee commented Jun 30, 2021 • edited Loading

yashaswee commented Jul 1, 2021

0-sv commented Aug 25, 2021 • edited Loading

yashaswee commented Jun 30, 2021 •

edited

Loading

0-sv commented Aug 25, 2021 •

edited

Loading