Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

getting null in response to api/v1/series #1698

Closed
certainmagic opened this Issue Jun 1, 2016 · 8 comments

Comments

Projects
None yet
6 participants
@certainmagic
Copy link

certainmagic commented Jun 1, 2016

first, let me say that prometheus is great. along with grafana, we've just started using it and it is really helping us tell what's going on in our servers. so thanks!

i noticed that my grafana dashboards were choking on some of my template variable queries. it looks like the problem is that the response contains nulls.

here's the query:

http://prometheus:9090/api/v1/series?match[]=http_request_duration_seconds_bucket

here's a snippet of the response with almost all of the data elements removed:

{
    "data": [
        {
            "__name__": "http_request_duration_seconds_bucket",
            "instance": "1",
            "le": "20"
        },
        null,
        {
            "__name__": "http_request_duration_seconds_bucket",
            "instance": "2",
            "le": "0.1"
        }
    ],
    "status": "success"
}

i had this problem on 0.18.0. (prometheus, version 0.18.0 (branch: release-0.18, revision: f12ebd6)
i also tried running with 0.19.2 just in case it had been fixed recently. the problem continues in 0.19.2.

I asked for pointers in this thread.

Björn asked me to move my data aside and run again. That fixed the problem. I think that both of the servers that had this problem had run out of space earlier. I've also found that we were killing them ungracefully 90 seconds after sending a TERM to shut them down -- and that checkpoints are currently taking about 2 minutes, so they weren't shutting down cleanly. I've seen it run a crash recovery after that. We are addressing the disk space issues and have extended the timeout to let it checkpoint cleanly on shutdown.

Björn suggested that there's a good chance it was corrupted indices.

Björn asked me to file a bug suggesting that prometheus respond better to corruption by not returning the nulls and logging an error.

Please let me know if there's any more information I can provide.

thanks again!
ab

@juliusv

This comment has been minimized.

Copy link
Member

juliusv commented Jun 2, 2016

Theory on how this can happen when the storage is corrupted:

Callstack:

@juliusv

This comment has been minimized.

Copy link
Member

juliusv commented Jun 2, 2016

My storage fu currently doesn't go deep enough to immediately say on which level this should be filtered out. Whether archivedMetric() should raise an error if it cannot find a value, or whether metricForRange() should just filter out nil metrics.

@beorn7 beorn7 self-assigned this Jun 2, 2016

@beorn7

This comment has been minimized.

Copy link
Member

beorn7 commented Jun 2, 2016

I'll look into it once I find time.

@beorn7

This comment has been minimized.

Copy link
Member

beorn7 commented Nov 2, 2016

With the changes that quarantine series that ran into an error, this should be solved.
Has this shown up again?

@cwarden

This comment has been minimized.

Copy link

cwarden commented Feb 9, 2017

I think I'm seeing the same problem in 1.5.1

$ curl -s -g 'http://prometheus:9090/api/v1/series?match[]={instance=~".*dev.*"}' | jq .
{
  "status": "success",
  "data": [
    null,
    null,
    ...
  ]
}
$ curl -s -XDELETE -g 'http://prometheus:9090/api/v1/series?match[]={instance=~".*dev.*"}' | jq .
{
  "status": "success",
  "data": {
    "numDeleted": 319
  }
}
$ curl -s -g 'http://prometheus:9090/api/v1/series?match[]={instance=~".*dev.*"}' | jq .
{
  "status": "success",
  "data": [
    null,
    null,
    ...
  ]
}
$ curl -s -XDELETE -g 'http://prometheus:9090/api/v1/series?match[]={instance=~".*dev.*"}' | jq .
{
  "status": "success",
  "data": {
    "numDeleted": 319
  }
}

@brian-brazil brian-brazil added this to the v2.x milestone Apr 6, 2017

@fabxc fabxc removed this from the v2.x milestone Jul 3, 2017

@beorn7 beorn7 removed their assignment Aug 8, 2017

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Nov 17, 2017

I'm going to presume this is resolved with 2.0, as the implementation all changed. If not, please let us know.

@certainmagic

This comment has been minimized.

Copy link
Author

certainmagic commented Nov 17, 2017

thanks! will do. :)

@lock

This comment has been minimized.

Copy link

lock bot commented Mar 23, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked and limited conversation to collaborators Mar 23, 2019

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.