Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ruler not working with BoltDB Shipper #3076

Closed
rajatvig opened this issue Dec 14, 2020 · 10 comments
Closed

Ruler not working with BoltDB Shipper #3076

rajatvig opened this issue Dec 14, 2020 · 10 comments

Comments

@rajatvig
Copy link
Contributor

Describe the bug
As detailed in the forum issue, Ruler cannot be v2.0.0 cannot be deployed out of the box when using boltdb-shipper backed by GCS storage.
The arguments for cache or index directory needs to be passed on and it requires this commit be made available so that the Ruler deployment does not attempt to write files.

To Reproduce
Steps to reproduce the behavior:

  1. Deploy Loki using Tanka for v2.0.0 with ruler and boltdb-shipper enabled
  2. Ruler pods go into a crash loop

Expected behavior
Ruler to start correctly

Environment:

  • Infrastructure: Kubernetes
  • Deployment tool: jsonnet

Screenshots, Promtail config, or terminal output

level=error ts=2020-12-10T23:15:12.966824939Z caller=log.go:149 msg="error running loki" err="mkdir : no such file or directory\nerror creating index client\ngithub.com/cortexproject/cortex/pkg/chunk/storage.NewStore\n\t/src/loki/vendor/github.com/cortexproject/cortex/pkg/chunk/storage/factory.go:176\ngithub.com/grafana/loki/pkg/loki.(*Loki).initStore\n\t/src/loki/pkg/loki/modules.go:287\ngithub.com/cortexproject/cortex/pkg/util/modules.(*Manager).initModule\n\t/src/loki/vendor/github.com/cortexproject/cortex/pkg/util/modules/modules.go:103\ngithub.com/cortexproject/cortex/pkg/util/modules.(*Manager).InitModuleServices\n\t/src/loki/vendor/github.com/cortexproject/cortex/pkg/util/modules/modules.go:75\ngithub.com/grafana/loki/pkg/loki.(*Loki).Run\n\t/src/loki/pkg/loki/loki.go:204\nmain.main\n\t/src/loki/cmd/loki/main.go:130\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:203\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1373\nerror initialising module: store\ngithub.com/cortexproject/cortex/pkg/util/modules.(*Manager).initModule\n\t/src/loki/vendor/github.com/cortexproject/cortex/pkg/util/modules/modules.go:105\ngithub.com/cortexproject/cortex/pkg/util/modules.(*Manager).InitModuleServices\n\t/src/loki/vendor/github.com/cortexproject/cortex/pkg/util/modules/modules.go:75\ngithub.com/grafana/loki/pkg/loki.(*Loki).Run\n\t/src/loki/pkg/loki/loki.go:204\nmain.main\n\t/src/loki/cmd/loki/main.go:130\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:203\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1373"                                                                                                                                                                                                                                                  ````
@stale
Copy link

stale bot commented Jan 14, 2021

This issue has been automatically marked as stale because it has not had any activity in the past 30 days. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale A stale issue or PR that will automatically be closed. label Jan 14, 2021
@rajatvig
Copy link
Contributor Author

Using Ruler with 2.1.0 with BoltDB Shipper does not work either

@stale stale bot removed the stale A stale issue or PR that will automatically be closed. label Jan 15, 2021
@stale
Copy link

stale bot commented Feb 21, 2021

This issue has been automatically marked as stale because it has not had any activity in the past 30 days. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale A stale issue or PR that will automatically be closed. label Feb 21, 2021
@maxmeyer
Copy link

I face the same problem. Is there any more information required to track down this issue?

@stale stale bot removed the stale A stale issue or PR that will automatically be closed. label Feb 25, 2021
@cuzzo333
Copy link

cuzzo333 commented Mar 8, 2021

Wanted to chime in that my team is facing this exact issue as well.

@nirmalpathak
Copy link

I am also facing a similar issue however in my case the ruler doesn't crash but it doesn't work as well. I see following in the loki logs.

loki | level=info ts=2021-03-27T05:33:18.370326332Z caller=module_service.go:91 msg="module stopped" module=ruler
loki | level=info ts=2021-03-27T05:34:39.543561777Z caller=module_service.go:59 msg=initialising module=ruler
loki | level=info ts=2021-03-27T05:34:39.543585011Z caller=ruler.go:403 msg="ruler up and running"

I am using Docker container to spin-up Loki along with Grafana, Promtail, etc. My config uses BoltDB shipper backed by AWS S3 as storage.

A detailed explanation of my issue has been posted on StackOverflow post.

@owen-d
Copy link
Member

owen-d commented May 27, 2021

Hey, I think there are two issues here. The second one is due to misconfiguration of the directory structure. Loki expects a tenant in the rule path:

/tmp/loki/rules/<tenant id>/rules1.yaml
                           /rules2.yaml

The first issue is fixed in v2.2.0+ now that we've merged #3008

@owen-d owen-d closed this as completed May 27, 2021
@yashaswee
Copy link

yashaswee commented Jun 30, 2021

If we are using a default rule_path ([rule_path: <filename> | default = "/rules"]) does Loki still look for a tenant id?
I am using helm-charts for loki, this is my targetRevision 2.4.1 and this is what my ruler config looks like

`          ruler:
              storage:
                type: s3
                s3:
                  s3: s3://region/test-grafana-loki
              alertmanager_url: http://alertmanager.xxx.xx.xx.xx:9093
              notification_timeout: 1m
              ring:
                kvstore:
                  store: inmemory
              enable_api: true`

I do not see any additional folder created in my s3 bucket for alerts. These are the mounts on the statefulsets where we can see rules

    Mounts:
      /data from storage (rw)
      /etc/loki from config (rw)
      /rules from rules (rw)
      /tmp/scratch from scratch (rw)
  Volumes:
   scratch:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:
    SizeLimit:  <unset>
   rules:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      loki-alerting-rules
    Optional:  false
   config:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  loki
    Optional:    false
   storage:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:
    SizeLimit:  <unset>

The ruler is not up according to the logs. This is the last log line wrt ruler
level=info ts=2021-06-29T15:30:24.303411371Z caller=module_service.go:91 msg="module stopped" module=ruler

Do I need to mentioned rule_path with the tenant_id as well?

@yashaswee
Copy link

This is still not fixed. If you are using helm charts we can not use the tenant_id.

@0-sv
Copy link

0-sv commented Aug 25, 2021

I had the similar error message, this was my config file for some reason:

cat /etc/loki/loki.yaml
auth_enabled: true
chunk_store_config:
  max_look_back_period: 24h
compactor:
  shared_store: filesystem
  working_directory: /data/loki/boltdb-shipper-compactor
ingester:
  chunk_block_size: 262144
  chunk_idle_period: 3m
  chunk_retain_period: 1m
  lifecycler:
    ring:
      kvstore:
        store: inmemory
      replication_factor: 1
  max_transfer_retries: 0
limits_config:
  enforce_metric_name: false
  reject_old_samples: true
  reject_old_samples_max_age: 24h
schema_config:
  configs:
  - from: "2018-04-15"
    index:
      period: 24h
      prefix: index_
    object_store: filesystem
    schema: v9
    store: boltdb
  - from: "2021-05-13"
    index:
      period: 24h
      prefix: index_
    object_store: filesystem
    schema: v11
    store: boltdb-shipper
server:
  http_listen_port: 3100
storage_config:
  boltdb_shipper:
    active_index_directory: /data/loki/boltdb-shipper-active
    cache_location: /data/loki/boltdb-shipper-cache
    cache_ttl: 24h
    shared_store: filesystem
  filesystem:
    directory: /data/loki/chunks
table_manager:
  retention_deletes_enabled: true
  retention_period: 24h

Well I thought config[0] didn't make any sense. By deleting that array item the server was able to start. So I would suggest to check whether the binary is sound by running /usr/bin/loki. If that's ok then go ahead and add the config file. Then you can isolate the problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants