Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prometheus reports malformed alert at wrong line number #2014

Closed
alexmbird opened this Issue Sep 21, 2016 · 8 comments

Comments

Projects
None yet
5 participants
@alexmbird
Copy link

alexmbird commented Sep 21, 2016

Greetings Prometheans!

See update below...

I am trying to setup alerting for my cluster and have run into a troublesome bug. What I want is pretty standard; to alert for filesystems with less than 20% space free.

Based upon examples I've seen elsewhere it seems the correct thing to do is:

ALERT NodeFilesystemUsageHigh
    IF node_filesystem_free{} / node_filesystem_size{} < 0.2
    LABELS { severity = "page" }
    ANNOTATIONS {
    summary = "Node filesystem usage is high",
    description = "Node {{ $labels.instance }}'s filesystem {{ $labels.filesystem }} has less than 20% disk space remaining."
    }

My understanding is that the expression node_filesystem_free{} / node_filesystem_size{} < 0.2 should produce an instance vector of all metrics representing a filesystem with under 20% free. I've tried it out in the console:

  • node_filesystem_free{} / node_filesystem_size{} gives a list of all my filesystems with a value between 0 and 1 representing free space. So far so good.
  • node_filesystem_free{} / node_filesystem_size{} < 0.2 gives an empty list because this is a good day and none of my filesystems are full

However, when I try to deploy this rule by reloading my Prometheus config, it logs the error:

time="2016-09-21T08:21:58Z" level=info msg="Loading configuration file /etc/prometheus/prometheus.yml" source="main.go:221"
time="2016-09-21T08:21:59Z" level=error msg="Failed to apply configuration: error loading rules, previous rule set restored: error parsing /etc/prometheus/alerts.rules: parse error at line 25, char 2: expected type vector in alert statement, got scalar" source="main.go:239"
time="2016-09-21T08:21:59Z" level=error msg="Error reloading config: one or more errors occured while applying the new configuration (-config.file=/etc/prometheus/prometheus.yml)" source="main.go:146"

I've been banging my head against this one for a few hours now. Doubtless I'm doing something wrong but no amount of googling reveals what it is. Can somebody please point me in the right direction?

In case it's relevant here's my Prometheus build information:

Build Information
Version     1.1.2
Revision    36fbdcc30fd13ad796381dc934742c559feeb1b5
Branch  master
BuildUser   root@a74d279a0d22
BuildDate   20160908-13:12:43
GoVersion   go1.6.3
@alexmbird

This comment has been minimized.

Copy link
Author

alexmbird commented Sep 21, 2016

After some time spent commenting sections of my alerts.rules I think I've found the problem.

A completely different rule (0 != bool 1, added to test whether alerts actually fire) is misbehaving. However, Prometheus is reporting the error as being at the end of the file regardless of its real location. This led me to believe it was in a different rule.

My whole alerts.rules is:

    # ALERT AlwaysTrigger
    #   IF 0 != bool 1
    #   LABELS { severity = "page" }
    #   ANNOTATIONS {
    #     summary = "Zomg 0 != 1",
    #     description = "{{ $labels.instance }} test alert firing"
    #   }
    # 
    # ALERT InstanceDown
    #   IF up == 0
    #   FOR 5m
    #   LABELS { severity = "page" }
    #   ANNOTATIONS {
    #     summary = "Instance {{ $labels.instance }} down",
    #     description = "{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 5 minutes."
    #   }
    #    
    ALERT NodeFilesystemUsageHigh
      IF node_filesystem_free{} / node_filesystem_size{} < 0.2
      LABELS { severity = "page" }
      ANNOTATIONS {
        summary = "Node filesystem usage is high",
        description = "Node {{ $labels.instance }}'s filesystem {{ $labels.filesystem }} has less than 20% disk space remaining."
      }

With the first rule commented out, it works fine. With it included Prometheus tells me parse error at line 25, even though the real error is on line 2.

@alexmbird alexmbird changed the title Alert Expression Failing with "expected type vector in alert statement, got scalar" Prometheus reports malformed alert at wrong line number Sep 21, 2016

@grobie

This comment has been minimized.

Copy link
Member

grobie commented Sep 21, 2016

We'll have to look into this, thank you. In the meantime, you could just use vector(1) to produce an always firing alert.

@nicklan

This comment has been minimized.

Copy link

nicklan commented Feb 24, 2017

Any update on this? It's still occurring with v1.5.2 and we ran into it. It can make debugging an error in a large rule file nearly impossible (had to resort to binary search, commenting halves of the file)

Here's a minimal example rule file that will show the problem:

ALERT Bad1
  IF increase(something-with-hyphens[5m]) > 0
  LABELS { label = "label" }

ALERT Ok1
  IF a_metric >= 1
  LABELS { label = "label" }

Using this Prometheus will report on startup:

ERRO[0000] Failed to apply configuration: error loading rules, previous rule set restored: error parsing first.rules: parse error at line 8, char 2: binary expression must contain only scalar and instant vector types  source=main.go:275

So it's still just reporting the final line of the file.

@GubendiranM

This comment has been minimized.

Copy link

GubendiranM commented Jun 28, 2017

D:\prometheus\new_prom>prometheus.exe
←[34mINFO←[0m[0000] Starting prometheus (version=2.0.0-alpha.3, branch=master, revision=70f96b0ffb6567100ffc91f7c3fe4e57c8d9dedb) ←[34msource←[0m="main.go:196"
←[34mINFO←[0m[0000] Build context (go=go1.8.3, user=root@5630fb1ab539, date=20170622-10:13:42) ←[34msource←[0m="main.go:197"
←[34mINFO←[0m[0000] Host details (windows) ←[34msource←[0m="main.go:198"
←[34mINFO←[0m[0000] Starting tsdb ←[34msource←[0m="main.go:210"
←[34mINFO←[0m[0000] tsdb started ←[34msource←[0m="main.go:216"
←[34mINFO←[0m[0000] Loading configuration file prometheus.yml ←[34msource←[0m="main.go:344"
←[31mERRO←[0m[0000] yaml: unmarshal errors: line 1: cannot unmarshal !!str ALERT s... into rulefmt.RuleGroups ←[31msource←[0m="manager.go:484"
←[31mERRO←[0m[0000] Failed to apply configuration: error loading rules, previous rule set restored ←[31msource←[0m="main.go:362"
←[31mERRO←[0m[0000] Error loading config: one or more errors occurred while applying the new configuration (-config.file=prometheus.yml) ←[31msource←[0m="main.go:265"

help me... I got above error on bellow configurations in prometheus

prometheus.yaml

global:
scrape_interval: 1s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
evaluation_interval: 1s # Evaluate rules every 15 seconds. The default is every 1 minute.
external_labels:
monitor: 'codelab-monitor'

rule_files:
- rules.conf
scrape_configs:

  • job_name: 'node'
    static_configs:
    • targets: ['localhost:1234']
  • job_name: 'prometheus'
    static_configs:
    • targets: ['localhost:1234']`

rules.conf
ALERT service_down
IF up == 0
ANNOTATIONS {
summary = "Instance is down",
description="Instance is down restart to resolve"
}

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Jun 28, 2017

@GubendiranM Please don't add support requests on unrelated bugs. https://groups.google.com/forum/#!aboutgroup/prometheus-users is the best place to ask questions.

@GubendiranM

This comment has been minimized.

Copy link

GubendiranM commented Jun 28, 2017

ok @brian-brazil
thank you for the reference

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Dec 8, 2017

This is obsolete, and #3549 improved the new way of doing things.

@lock

This comment has been minimized.

Copy link

lock bot commented Mar 23, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked and limited conversation to collaborators Mar 23, 2019

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.