Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues with Volume Mounts in Helm Chart #3005

Closed
alexanderursu99 opened this issue Nov 30, 2020 · 15 comments
Closed

Issues with Volume Mounts in Helm Chart #3005

alexanderursu99 opened this issue Nov 30, 2020 · 15 comments
Labels
component/ruler help wanted We would love help on these issues. Please come help us! stale A stale issue or PR that will automatically be closed.

Comments

@alexanderursu99
Copy link

Describe the bug
Loki 2.0.0 Dockerfile creates directories /loki/rules and /loki/tmprules, but suggested values in comments for ruler config for Helm chart show

# Needed for Alerting: https://grafana.com/docs/loki/latest/alerting/
# This is just a simple example, for more details: https://grafana.com/docs/loki/latest/configuration/#ruler_config
#  ruler:
#    storage:
#      type: local
#      local:
#        directory: /rules
#    rule_path: /tmp/scratch
#    alertmanager_url: http://alertmanager.svc.namespace:9093
#    ring:
#      kvstore:
#        store: inmemory
#    enable_api: true

When using this config, I get the following logs while Loki is on CrashLoopBackoff

level=info ts=2020-11-30T19:29:48.770231341Z caller=main.go:128 msg="Starting Loki" version="(version=2.0.0, branch=HEAD, revision=6978ee5d7)"
level=error ts=2020-11-30T19:29:48.770371572Z caller=log.go:149 msg="error running loki" err="mkdir /rules: read-only file system\nerror initialising module: ruler-storage\ngithub.com/cortexproject/cortex/pkg/util/modules.(*Manager).initModule\n\t/src/loki/vendor/github.com/cortexproject/cortex/pkg/util/modules/modules.go:105\ngithub.com/cortexproject/cortex/pkg/util/modules.(*Manager).InitModuleServices\n\t/src/loki/vendor/github.com/cortexproject/cortex/pkg/util/modules/modules.go:75\ngithub.com/grafana/loki/pkg/loki.(*Loki).Run\n\t/src/loki/pkg/loki/loki.go:204\nmain.main\n\t/src/loki/cmd/loki/main.go:130\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:203\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1373"

To Reproduce
Steps to reproduce the behavior:

  1. Use provided sample ruler config with Helm chart version 2.1.0

Expected behavior
For Loki to run, and for some clarity/consistency in documentation.

Environment:

  • Infrastructure: Kubernetes
  • Deployment tool: Helm
@alexanderursu99
Copy link
Author

For additional context, I also was following the instructions to upgrade from Loki 1.5.0 to 2.0.0 using the Helm chart, if that helps with anything.

@cyriltovena cyriltovena added component/ruler helm help wanted We would love help on these issues. Please come help us! labels Dec 4, 2020
@alexanderursu99
Copy link
Author

I think you're right.
I noticed when I used those values on initial install, there would be issues with trying to make the directories before they're mounted I believe. I set them to some other values that exist in the container image, and then switched back after and it was fine.

@alexanderursu99 alexanderursu99 changed the title Mismatch of Helm Values and Dockerfile Directories Issues with Volume Mounts in Helm Chart Dec 9, 2020
@stale
Copy link

stale bot commented Jan 9, 2021

This issue has been automatically marked as stale because it has not had any activity in the past 30 days. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale A stale issue or PR that will automatically be closed. label Jan 9, 2021
@alexanderursu99
Copy link
Author

Curious to hear if anyone knows what about the volume mount causes the issue on the fresh install. One suspicion I have is with the Dockerfile.

@stale stale bot removed the stale A stale issue or PR that will automatically be closed. label Jan 12, 2021
@nabadger
Copy link

nabadger commented Jan 26, 2021

Hitting this issue as well:

The rules dir generation seems to come from the Dockerfile here:

RUN mkdir -p /loki/rules && \


We got around it by mounting it in a different location (seems to be configurable).

@Alexander-Bartosh
Copy link

Alexander-Bartosh commented Jan 31, 2021

With default suggested ruler config rules are evaluated

ruler:
      storage:
        type: local
        local:
          directory: /rules  #Hardcoded in Heml chart  
      rule_path: /tmp/scratch #Hardcoded in Heml chart
      alertmanager_url: http://prometheus-alertmanager
      ring:
        kvstore:
          store: inmemory
      enable_api: true
      enable_alertmanager_v2: true
 

Rule evaluation works, but each rule is checked twice because of the layout of the config maps:
level=info ts=2021-01-31T15:34:43.578512683Z caller=metrics.go:83 org_id=..data traceID=1772340e86df484b
level=info ts=2021-01-31T15:34:44.780788012Z caller=metrics.go:83 org_id=..2021_01_30_23_26_31.647722814 traceID=123cc88620cbc543

 $ ls /rules* -la
total 12
drwxrwsrwx    3 root     loki          4096 Jan 30 23:26 .
drwxr-xr-x    1 root     root          4096 Jan 30 21:50 ..
drwxr-sr-x    2 root     loki          4096 Jan 30 23:26 ..2021_01_30_23_26_31.647722814
lrwxrwxrwx    1 root     loki            31 Jan 30 23:26 ..data -> ..2021_01_30_23_26_31.647722814
lrwxrwxrwx    1 root     root            31 Jan 30 21:50 loki-alerting-rules.yaml -> ..data/loki-alerting-rules.yaml
/ $

it looks like only ..data and the last Config map are evaluated

ls /tmp/scratch/..*
/tmp/scratch/..:
scratch

/tmp/scratch/..2021_01_30_21_50_32.635166191:
loki-alerting-rules.yaml

/tmp/scratch/..2021_01_30_23_26_31.647722814:
loki-alerting-rules.yaml

/tmp/scratch/..2021_01_31_15_46_40.876645713:
loki-alerting-rules.yaml

/tmp/scratch/..data:
loki-alerting-rules.yaml

@Alexander-Bartosh
Copy link

Alexander-Bartosh commented Feb 2, 2021

Was able to fix the issue with double rules evaluation only by changing the helm chart:
https://github.com/grafana/helm-charts/blob/main/charts/loki/templates/statefulset.yaml#L74
from:
/rules
to:
/rules/fake

Log:
level=info ts=2021-02-02T17:35:35.985183531Z caller=metrics.go:83 org_id=fake traceID=14b04d3fb66d19e2 latency=fast query="count_over_time(....
Sadly this can not be changed via values.yaml

it looks like this was addressed in loki-distributed chart:
https://github.com/grafana/helm-charts/blob/main/charts/loki-distributed/templates/ruler/deployment-ruler.yaml#L80

@stale
Copy link

stale bot commented Mar 19, 2021

This issue has been automatically marked as stale because it has not had any activity in the past 30 days. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale A stale issue or PR that will automatically be closed. label Mar 19, 2021
@EraYaN
Copy link
Contributor

EraYaN commented Apr 8, 2021

This is still a problem

@stale stale bot removed the stale A stale issue or PR that will automatically be closed. label Apr 8, 2021
@owen-d
Copy link
Member

owen-d commented Apr 15, 2021

Can we reopen in https://github.com/grafana/helm-charts now that we've moved ownership there?

@EraYaN
Copy link
Contributor

EraYaN commented Apr 16, 2021

An organization admin can move this issue there.

@lazypower
Copy link

This tripped me up quite a bit for the grafana/helm-charts/loki-distributed release.

What wound up working for me on a fresh install of the ruler was to specify local storage, and point the storage dir to where the rules wind up getting rendered.

        ruler:
          storage:
            type: local
            local:
              directory: /etc/loki/rules
          ring:
            kvstore:
              store: memberlist
          # A path to render temporary rule files. Changing this breaks things in fun ways
          rule_path: /tmp/loki/rules-temp
          enable_api: true
          enable_alertmanager_v2: true

So i'm not 100% positive why the docs don't reference the storage path as being where the ruler will scan for provided rules, but that was the missing link for me. I hope this helps others stuck in the same loop of debugging why rule files aren't loading.

@stale
Copy link

stale bot commented Jun 7, 2021

This issue has been automatically marked as stale because it has not had any activity in the past 30 days. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale A stale issue or PR that will automatically be closed. label Jun 7, 2021
@stale stale bot closed this as completed Jun 16, 2021
@yashaswee
Copy link

yashaswee commented Jul 1, 2021

I am having the same issue, using s3 to store alerts with the loki-stack helm-chart. I had to open an issue
#3922. This is mu ruler config looks like. There are no errors that would help in troubleshooting this. The logs said this but we are in single binary mode.

Jun 30 23:14:30 loki level=info ts=2021-07-01T04:14:30.810076529Z caller=modules.go:477 msg="RulerStorage is not configured in single binary mode and will not be started."

ruler:
           storage:
             type: s3
             s3:
               s3: s3://region/bucket-name
           alertmanager_url: http://alertmanager.default.svc.cluster.local:9093
           notification_timeout: 1m
           rule_path: /tmp/scratch
           ring:
             kvstore:
               store: inmemory
           enable_api: true

Can we please get this fixed or get some examples regarding this.

azuwis added a commit to azuwis/grafana-helm-charts that referenced this issue Dec 8, 2021
azuwis added a commit to azuwis/grafana-helm-charts that referenced this issue Jan 6, 2022
azuwis added a commit to azuwis/grafana-helm-charts that referenced this issue Jan 7, 2022
See grafana/loki#3005 (comment) for issue detail.

Signed-off-by: Zhong Jianxin <azuwis@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/ruler help wanted We would love help on these issues. Please come help us! stale A stale issue or PR that will automatically be closed.
Projects
None yet
Development

No branches or pull requests

9 participants