Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't work with the latest version api (v2) alertmanager #115

Closed
z1oy opened this issue Oct 4, 2018 · 39 comments
Closed

Don't work with the latest version api (v2) alertmanager #115

z1oy opened this issue Oct 4, 2018 · 39 comments
Assignees

Comments

@z1oy
Copy link

@z1oy z1oy commented Oct 4, 2018

Hi!

In new version of alertmanager API version will be changed to v2. URI /api/v1/ will no longer work. And dashboard not work - request to http://alertmanager:9093/api/v1/alerts/groups failed with 404 Not Found

docker image for test prometheus/alertmanager:master

@prymitive

This comment has been minimized.

Copy link
Owner

@prymitive prymitive commented Oct 4, 2018

I see that v2 is also not yet fully finalised
API v2 is still under heavy development and thereby subject to change.
which is ok, it just means that we still need to track support for it per alertmanager release, rather than api version.
It should be easier to add v2 support, since the client can be generated, I'll need to look into that.
Adding integration tests would be nice

@sylr

This comment has been minimized.

Copy link
Contributor

@sylr sylr commented Dec 13, 2018

@mxinden

This comment has been minimized.

Copy link

@mxinden mxinden commented Dec 14, 2018

Hi there,

this is Max, maintainer of Alertmanager and developer of the Alertmanager API v2.

which is ok, it just means that we still need to track support for it per alertmanager release, rather than api version.

Yes, sadly we can not guarantee that there will never ever be any breaking changes withing the API v2, but not introducing any breaking changes within the major version is very much a goal.

It should be easier to add v2 support, since the client can be generated, I'll need to look into that.

Let me know how that goes. I am happy to help.

@prymitive

This comment has been minimized.

Copy link
Owner

@prymitive prymitive commented Dec 14, 2018

Hi @mxinden, thanks for jumping in. I fully understand that you might not want to promise to much too soon when it comes to API and it really isn't an issue. I generate mock responses from the API for each alertmanager release and use that when running tests. I would continue doing that even if there was a strong API guarantee, so I don't consider this an issue.
I haven't yet looked at v2 at all but I fully support that chage and appriciate help offer. I hope to have some time to work on this soon and I'll be sure to let you know if I run into any issues.

@mxinden

This comment has been minimized.

Copy link

@mxinden mxinden commented Dec 14, 2018

I generate mock responses from the API for each alertmanager release

Fancy. Hopefully you can now also generate those mocks from the OpenAPI specification.

@prymitive

This comment has been minimized.

Copy link
Owner

@prymitive prymitive commented Dec 14, 2018

Yeah, API doesn't really matter, I just spin up alertmanager instance, send it alerts and then curl and save API response (https://github.com/prymitive/karma/tree/master/internal/mock).
It's not perfect but allows me to test if I can parse API responses from every release and still have all keys I expect.

@prymitive

This comment has been minimized.

Copy link
Owner

@prymitive prymitive commented Dec 31, 2018

@mxinden I'm finally started looking into switching to openapi and I see that the grouping information is no longer exposed via the API as it was removed via prometheus/alertmanager#1525.
This flattens the API, so it seems like a good idea, but it doesn't seem that the grouping configuration is exposed in any other way (except for manually parsing config dump on /status endpoint, which seems a bit hackish). Is that correct?
I do get all receiver names each alert got routed via, so if I had also access to the list labels used for each receiver I could reconstruct that grouping. Do you think it could be added? Maybe under /receivers ?

@prymitive

This comment has been minimized.

Copy link
Owner

@prymitive prymitive commented Jan 9, 2019

@mxinden it looks like that missing grouping information will be tricky to expose (as per #1694). I manage to add routing config to the v2 api, but it seems like I would need to reimplement that logic on karma side if I want to retain presenting alerts grouped same way as notifications send by alertmanager.

@prymitive

This comment has been minimized.

Copy link
Owner

@prymitive prymitive commented Jan 9, 2019

Without group_by information we can:

  1. Make it configurable in the config file
  2. Add filter expressions for grouping (@group_by=alertname,cluster)

The other option is to add /api/v2/groups that exposes groups as hold by the Dispatcher, but that would basically restore /api/v1/alerts/groups which was removed via prometheus/alertmanager#1525. Not sure if that would be acceptable, (@mxinden?).

@mxinden

This comment has been minimized.

Copy link

@mxinden mxinden commented Jan 16, 2019

if I had also access to the list labels used for each receiver

Would you mind adding some more details here?

Adding @stuartnelson3, maybe he has more thoughts on this topic.

@stuartnelson3

This comment has been minimized.

Copy link

@stuartnelson3 stuartnelson3 commented Jan 16, 2019

The issue referencing this (display alerts in a UI as they're actually grouped in the AM server) exists here: prometheus/alertmanager#868

Exposing configured groupings seems to be desired, and I was even talking with @beorn7 today about adding an additional page that specifically lists collapsed receivers lists containing active alerts, groupings that go to that receiver, and potentially the ability to manually notify the configured group (to be discussed, just an idea).

Providing a flat list is helpful, and we tried to add filtering to make that more useful, but it does only allow a consumer of the API to set a single group_by, whereas exposing the actually-configured groupings could be more interesting to users.

I believe within AM we even give each grouping its own fingerprint, which would allow direct links to that group.

maybe

  • /api/v2/groups (feels silly since we just removed it but maybe that was premature)
  • /api/v2/groups/:fingerprint, which could be added to the templating language

If this is interesting we could open an issue in AM and start a formal discussion? It appears we removed it initially because am ui/amtool wasn't consuming from it, but it seems like we would want it to offer the "natural grouping" view, and there are external users wanting it.
@mxinden @simonpasquier @beorn7 @prymitive

@prymitive

This comment has been minimized.

Copy link
Owner

@prymitive prymitive commented Jan 16, 2019

if I had also access to the list labels used for each receiver

Would you mind adding some more details here?

That was me misunderstanding receivers and routes, I though routes are more like a switch statement with flat mapping and there's always the same group_by for each receivers. I've opened prometheus/alertmanager#1694 with that assumption and thinking that if I would have routing config exposed I could group everything on the client side.
But after looking at how it works I found that there's a lot of complexity in parsing (since it can be a deeply nested tree), so it would require basically re-implementing alertmanager routing logic client side, which doesn't sound like a good idea to me.

I agree with a lot of points about UX on prometheus/alertmanager#868.
So far I like the idea of having /api/v2/groups the most, since it directly exposes alertmanager internal state of grouping. I could have client side grouping but it would be limited compared to alertmanager grouping, unless I implement configuration logic as flexible as alertmanager routing logic.

@prymitive

This comment has been minimized.

Copy link
Owner

@prymitive prymitive commented Jan 16, 2019

Raised prometheus/alertmanager#1712 to re-add groups to the api

prymitive added a commit that referenced this issue Jan 22, 2019
>= 0.16 doesn't yet work due to #115
prymitive added a commit that referenced this issue Jan 22, 2019
>= 0.16 doesn't yet work due to #115
@prymitive

This comment has been minimized.

Copy link
Owner

@prymitive prymitive commented Feb 5, 2019

Is there anything I can do to help with this on alertmanager side?

@beorn7

This comment has been minimized.

Copy link

@beorn7 beorn7 commented Feb 5, 2019

I'm afraid you need intimate knowledge of the Alertmanager codebase to do so. @stuartnelson3 told me, prometheus/alertmanager#868 is an easy fix, if you have that knowledge. Now we only need to make him do it. (And yes, that is the plan once he is back from vacations.)

@prymitive

This comment has been minimized.

Copy link
Owner

@prymitive prymitive commented Feb 5, 2019

Thanks, I think that the work for prometheus/alertmanager#868 will enrich the API with enough information to effectively solve this issue here, so I'll just wait for that to happen. Let me know if there's anything I can help with.

@ScrumpyJack

This comment has been minimized.

Copy link

@ScrumpyJack ScrumpyJack commented Feb 14, 2019

does this mean that lmierzwa/karma:latest won't work with alertmanager 0.16.1?

@prymitive

This comment has been minimized.

Copy link
Owner

@prymitive prymitive commented Feb 14, 2019

it won't work with alertmanager >= 0.16.0 since the API endpoint used by karma got remove in that version and there is currently no alternative with the same information
we're waiting on prometheus/alertmanager#868 to be resolved for this ticket to be unblocked

mh720 pushed a commit to swarmstack/swarmstack that referenced this issue Feb 14, 2019
@dswarbrick

This comment has been minimized.

Copy link

@dswarbrick dswarbrick commented Feb 14, 2019

I'm confused. The release notes for 0.16.0 say:

API v1 will be removed with Alertmanager release v0.18.0.

Shouldn't that mean that Karma should continue to work with v0.16.x, v0.17.x?

@prymitive

This comment has been minimized.

Copy link
Owner

@prymitive prymitive commented Feb 14, 2019

I'm confused. The release notes for 0.16.0 say:

API v1 will be removed with Alertmanager release v0.18.0.

Shouldn't that mean that Karma should continue to work with v0.16.x, v0.17.x?

I guess that the changelog isn't 100% accurate. What it means is that the entire api v1 is planned to be remove in v0.18, but some parts of v1 were already removed (prometheus/alertmanager#1525)

[CHANGE] Remove `api/v1/alerts/groups` GET endpoint (#1508)

@mpursley

This comment has been minimized.

Copy link

@mpursley mpursley commented Feb 15, 2019

See also... Incompatibility with alertmanager 0.16.x
cloudflare/unsee#273

@mpursley

This comment has been minimized.

Copy link

@mpursley mpursley commented Feb 15, 2019

I wonder if it make sense to ask the Alertmanager guys if we can just put api/v1/alerts/groups back in? E.g. re-add the code that was removed from this PR? prometheus/alertmanager#1525

As:

  1. This is the only endpoint from the V1 API that was removed, and
  2. It sounds like this endpoint was just removed for reasons like "this is a sizeable chunk of code that probably hasn't been used for over a year". Though, there are some remote clients (like unsee/karma) that are using this endpoint, so might be worth putting it back in to support these API clients?

I'll ask them here... prometheus/alertmanager#1525 (comment)

@ScrumpyJack

This comment has been minimized.

Copy link

@ScrumpyJack ScrumpyJack commented Mar 19, 2019

Are we good now?

prometheus/alertmanager#1791

@prymitive

This comment has been minimized.

Copy link
Owner

@prymitive prymitive commented Apr 10, 2019

I managed to test changes from prometheus/alertmanager#1791 and it all looks good. But I need to do a fair amount of refactoring to add support for the new API, so it will take me a few days. Should be ready soon

@prymitive

This comment has been minimized.

Copy link
Owner

@prymitive prymitive commented Apr 18, 2019

prometheus/alertmanager#1791 got merged, I need to finish refactoring needed to put the support for it in, should be done in a few days

@prymitive

This comment has been minimized.

Copy link
Owner

@prymitive prymitive commented Apr 24, 2019

Got mostly working PR - #644, just need a few finishing touches. I think that I'll need to drop v1 support in some not too distant future, that will allow me to cleanup lots of ugly code I have.

@prymitive

This comment has been minimized.

Copy link
Owner

@prymitive prymitive commented Apr 24, 2019

Merged, alertmanager >=0.17.0 should be supported, now we need to wait for that release

@stuartnelson3

This comment has been minimized.

Copy link

@stuartnelson3 stuartnelson3 commented Apr 24, 2019

My plan is to release 0.17.0 next week. Thanks for your patience throughout all of this.

@matejzero

This comment has been minimized.

Copy link

@matejzero matejzero commented Apr 25, 2019

Thank you both for making this happen. We are just deploying Prometheus to production and this update will be very useful.

@mh720

This comment has been minimized.

Copy link

@mh720 mh720 commented May 2, 2019

karma

prom/alertmanager:master (v0.17.0)
lmierzwa/karma:v0.33

sh-4.2# curl http://alertmanager:9093/api/v1/alerts/groups
404 page not found
sh-4.2# curl http://alertmanager:9093/api/v2/alerts/groups
[]

@PsychoSid

This comment has been minimized.

Copy link

@PsychoSid PsychoSid commented May 3, 2019

sh-4.2# curl http://alertmanager:9093/api/v2/alerts/groups
[]

As v0.33 was released 20 days ago, and #644 was only merged in 10 days ago a compliant version is yet to be officially released I am guessing.

@prymitive

This comment has been minimized.

Copy link
Owner

@prymitive prymitive commented May 3, 2019

There will be a new release soon, I'm waiting for alertmanager docker images to be pushed (prometheus/alertmanager#1874). Once that's done I'll need to do a final round of testing and small fixes

@mohsen0

This comment has been minimized.

Copy link

@mohsen0 mohsen0 commented May 3, 2019

Can we have a release, please?

@prymitive

This comment has been minimized.

Copy link
Owner

@prymitive prymitive commented May 3, 2019

v0.34 was released today - https://github.com/prymitive/karma/releases/tag/v0.34
Binaries and docker image should get published in an hour or so

@prymitive prymitive closed this May 3, 2019
@davidkarlsen

This comment has been minimized.

Copy link
Contributor

@davidkarlsen davidkarlsen commented May 3, 2019

@dopafon

This comment has been minimized.

Copy link

@dopafon dopafon commented May 8, 2019

I just tried v0.34 and v0.35 (docker images) against alertmanager 0.17.0.
Unfortunately I cannot get it working.
The log output looks like it is trying to talk to the alertmanager v1 api still (see below).
Am I missing something?

time="2019-05-08T08:59:47Z" level=info msg="Initial Alertmanager query"
time="2019-05-08T08:59:47Z" level=info msg="Pulling latest alerts and silences from Alertmanager"
time="2019-05-08T08:59:47Z" level=info msg="[default] Collecting alerts and silences"
time="2019-05-08T08:59:47Z" level=info msg="GET https://user:pass@alertmanager.example.com:9093/alerts/metrics timeout=40s"
time="2019-05-08T08:59:47Z" level=error msg="[default] https://user:pass@alertmanager.example.com:9093/alerts/metrics request failed: request to https://user:pass@alertmanager.example.com:9093/alerts/metrics failed with 404 Not Found"
time="2019-05-08T08:59:47Z" level=info msg="GET https://user:pass@alertmanager.example.com:9093/alerts/api/v1/status timeout=40s"
time="2019-05-08T08:59:47Z" level=error msg="[default] https://user:pass@alertmanager.example.com:9093/alerts/api/v1/status request failed: request to https://user:pass@alertmanager.example.com:9093/alerts/api/v1/status failed with 404 Not Found"
time="2019-05-08T08:59:47Z" level=error msg="[default] unknown error (status 401): {resp:0xc0005d83f0} "
time="2019-05-08T08:59:47Z" level=info msg="Pull completed"
time="2019-05-08T08:59:47Z" level=info msg="Done, starting HTTP server"
time="2019-05-08T08:59:47Z" level=info msg="Listening on :8080"
time="2019-05-08T08:59:56Z" level=info msg="[100.64.9.1 MIS] <200> GET /alerts.json took 1.506762ms"
time="2019-05-08T08:59:59Z" level=info msg="[100.64.9.1] GET / took 515.388µs"
time="2019-05-08T09:00:02Z" level=info msg="[100.64.9.1 HIT] <200> GET /alerts.json took 357.478µs"```
@prymitive

This comment has been minimized.

Copy link
Owner

@prymitive prymitive commented May 8, 2019

v1 is only used for /api/v1/status, which is still present on 0.17.0. Maybe it's a problem with /alerts/, is your alertmanager really mounted on alertmanager.example.com:9093/alerts as the root URI?

@dopafon

This comment has been minimized.

Copy link

@dopafon dopafon commented May 8, 2019

You are right, the /alerts path was wrong (was a leftover of some testing). I changed it to alertmanager.example.com:9093/ and the 404's are gone.
However, I still get the unknown error (status 401) (see full log output below)

time="2019-05-08T09:29:26Z" level=info msg="[default] Remote Alertmanager version: 0.17.0"
time="2019-05-08T09:29:26Z" level=error msg="[default] unknown error (status 401): {resp:0xc0002c7170} "
@prymitive

This comment has been minimized.

Copy link
Owner

@prymitive prymitive commented May 8, 2019

401 is authentication error, if configured password is correct then please open a new issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
You can’t perform that action at this time.