Skip to content

SMPROD-1309: Example for restoring Alerts now removes non-existent notification channel IDs #54

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
May 25, 2018

Conversation

philrz
Copy link
Contributor

@philrz philrz commented May 19, 2018

Someone might be restoring Alert configs from another environment, in which case the Notification Channel IDs in the saved Alert JSON is not expected to match the Notification Channel IDs in the target environment. This enhancement drops the non-matching IDs but still restores the Alert config.

To support this, the get_notification_ids() method is enhanced such that if no channels filter is provided, it just lists all IDs. It used to only support returning a filtered list based on type/name of channels.

@philrz philrz requested review from Chobicus and mstemm May 19, 2018 00:47
found = False
for ch in res.json()["notificationChannels"]:
if c['type'] == ch['type']:
if c['type'] == 'SNS':
Copy link

@Chobicus Chobicus May 22, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Though I don't think it should be part of this review but VictorOps is missing from this list of types. It can be compared the same way as OpsGenie, by name.

@mstemm
Copy link
Contributor

mstemm commented May 22, 2018

I don't think I know enough about how the monitor side of the structures works to properly comment on this. Are the notification ids supposed to be globally unique, although the set of them may not be the same from one environment to another?

Copy link

@Chobicus Chobicus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Writing a review with the disclaimer that I am "level 1 python user".
Code looks good but when I tried running the restore of alerts:

  • python 2.7 complains on a['description'] += ' (created via restore_alerts.py)' TypeError: unsupported operand type(s) for +=: 'NoneType' and 'str'
  • python 3.6 complains on SyntaxError: Missing parentheses in call to 'print'.

@Chobicus
Copy link

Ok, got it. 2.7 should be used.
My exported alert have no description so concatenation of None and str is tried.

@philrz
Copy link
Contributor Author

philrz commented May 23, 2018

@Chobicus: Can you share your exported alert? I'd run into something similar when first revisiting this method, which is why this branch contains the if 'description' in a: line. That caught the case of an Alert I created in the UI but never typed any Description text before exporting it: In my case the exported JSON simply had no description entry at all, e.g.:

{
    "alerts": [
        {
            "autoCreated": false,
            "condition": "avg(avg(cpu.used.percent)) > 50",
            "createdOn": 1527036277000,
            "enabled": true,
            "id": 1240280,
            "modifiedOn": 1527036277000,
            "name": "New Metric Alert",
            "notificationCount": 0,
            "rateOfChange": false,
            "severity": 4,
            "teamId": 5278,
            "timespan": 600000000,
            "type": "MANUAL",
            "version": 1
        }
    ]

Your error message makes it seem like you had a description but it was null in the JSON for some reason. I can certainly catch that case as well, but now I'm curious when the backend would return that.

Perhaps you could also respond to @mstemm's question above. Based on my black box testing, I think they're going to be globally unique within a given environment. I say this because I created two separate on-prem environments, and the first Notification Channel created in each had ID of 1. Meanwhile a new "customer" I created for someone in app-staging SaaS has ID 712 for their first, which leads me to assume they're being assigned as next-highest-integer out of a global pool.

@mstemm's comment about "the set of them may not be the same from one environment to another" reminds me that I should explain the logic a bit more. The use case that motivated the original example was a customer who wanted to keep their Alerts config always under revision control is JSON objects in GitHub and never configure them through the UI. In this regard, they'd always expect the same Notification Channels IDs to be present and re-used each time the Alerts are "updated" from the JSON. Meanwhile, a separate user stumbled onto this same example and pulled the Alerts down in one environment and tried to restore them into another. Assuming SaaS and the globally unique pool of IDs, this was guaranteed to fail, hence the approach of just silently skipping over IDs that don't exist. In a Slack discussion I also floated the idea of making a new option so the user could explicitly decide they want to drop all IDs on restore, but @adityakinifr seemed to like the silent approach I'm using here and it was easy to implement, so I just went with it.

I recognize this leaves open the possibility using this example to download/restore between two on-prem environments, they might end up restoring with enabled Notification Channel IDs they didn't intend (e.g. if ID 2 in one environment was an email channel but 2 was a Slack channel in the other). Since it's ultimately just a showcase of API functionality, though, I didn't try to be a perfectionist here. My main motivation was just to make sure someone didn't see the script bomb with an opaque error message.

@Chobicus
Copy link

Chobicus commented May 23, 2018

@philrz Here is one of the alerts I used:

{
  "alerts": [
    {
      "version": 1,
      "name": "Machine down",
      "description": null,
      "teamId": 7856,
      "enabled": true,
      "filter": "host.mac = \"6c:62:6d:46:a9:b6\"",
      "type": "MANUAL",
      "condition": "avg(timeAvg(uptime)) = 0",
      "timespan": 600000000,
      "severity": 4,
      "notificationChannelIds": [
        10393
      ],
      "segmentBy": [
        "host.hostName"
      ],
      "segmentCondition": {
        "type": "ANY"
      },
      "id": 1174562,
      "createdOn": "2018-01-20T17:13:24.000Z",
      "modifiedOn": "2018-01-20T17:13:24.000Z"
    }
...

and you are right. The description has value of null. I exported the alerts using UI "Export JSON" button on alerts page.
What happens is that backend returns JSON without description entry but the code that does the export, on the UI side, adds the missing field with the value of null.

@mstemm I can just confirm that Phil is again right. Notification channels get an auto-incremented id during the creation which are unique in an environment.

@philrz
Copy link
Contributor Author

philrz commented May 24, 2018

@Chobicus: Thanks for the additional info. When working on these examples, I'd only considered the possibility of the Alerts JSON having come from the list_alerts.py example, which preserves the original object as delivered from the backend with the description missing. Once I went in there to catch the case of "description":null, that revealed the next snag: Those Alert objects that came through the web UI also have the createdOn and modifiedOn represented as ISO string timestamps rather than the epoch time the backend gives/expects. I handled all these cases in a commit I just pushed to this branch, so feel free to give it another go.

Copy link

@Chobicus Chobicus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch on time formats.

I just noticed that check if description exists is needed also during the alert creation before line 83.

@philrz
Copy link
Contributor Author

philrz commented May 24, 2018

@Chobicus: Thanks for spotting that. I've pushed another commit that addresses that. Maybe we're done? :)

@Chobicus Chobicus merged commit 32d51d8 into master May 25, 2018
@Chobicus Chobicus deleted the restore-alerts-without-notifications branch May 25, 2018 08:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants