Skip to content
This repository has been archived by the owner on Jan 12, 2022. It is now read-only.

Implement Hasura Actions for updating AlertManager configs and validating Credential/Exporter configurations #373

Closed
nickbp opened this issue Feb 16, 2021 · 8 comments
Assignees

Comments

@nickbp
Copy link
Contributor

nickbp commented Feb 16, 2021

Had a meeting yesterday with @MatApple where he mentioned Hasura Actions: https://hasura.io/docs/1.0/graphql/core/actions/index.html

We might be able to add use them to embed validation at the point when data is submitted to GraphQL. For example, there's additional sanity checks for Credentials and Exporter configurations in the Go code - such as raising an error if the user tries to assign a GCP credential to a CloudWatch (AWS) exporter. But these sanity checks currently aren't run if the data is submitted directly to GraphQL, skipping the Go HTTP API. If we can embed them in the GraphQL layer, then the UI for example could reuse the same validation without needing to reimplement it in TypeScript.

Relates to: #310 (credentials/exporters epic), #314 (go API implementation issue)

@nickbp
Copy link
Contributor Author

nickbp commented Feb 18, 2021

Something pointed out in the above issue is that it will likely be tricky for the UI to invoke the Go APIs, because they do not have access to the auth tokens that an end user is expected to use with curl commands. Given this, it'd be way easier for the UI to just talk to Hasura/graphql directly. Hence why we'd want Hasura to implement any config validation via actions if possible.

Figuring out whether this is feasible should be done soon to avoid blocking the UI on knowing what API to call. Assuming it works, the actual implementation of the validation isn't blocking, assuming that adding it would be transparent to the UI (and other graphql clients).

@nickbp
Copy link
Contributor Author

nickbp commented Feb 23, 2021

It looks like this should be pretty straightforward to do and the Hasura docs specifically call out the idea of running validation before an insert/update is performed.

I still need to look a bit more into the following details:

  • Making sure that the Create[Credentials|Exporters] calls will work. I haven't seen any examples of an action that accepts an array of items, which is what Create[Credentials|Exporters] each expect. This should just be an issue of making sure the Action schema as defined by Hasura works. Should be fine, and I may have solved this already, but there's some risk and I won't know for sure until I've tried exercising the action.
  • Making sure that the insert or update is actually performed after the validation passes. The Hasura docs have been a bit contradictory on this. However there's an examples repo that may provide clear guidance on this.

Refs:

nickbp pushed a commit that referenced this issue Feb 23, 2021
@nickbp nickbp added in progress state (used by codetree) and removed backlog state (used by codetree) labels Feb 23, 2021
@nickbp
Copy link
Contributor Author

nickbp commented Feb 23, 2021

Just had a discussion with Nahum about doing a similar thing, using Hasura Actions fetching and storing alertmanager configs from the UI (#380). In that case, no follow-up calls to Postgres are needed, as the call would just be a wrapper around an existing Opstrace API. Given this it would be a good starting point to get the alertmanager calls working first, and then I can come back to the this credentials/exporters case where the data needs to also be written to Postgres.

In summary I'm going to briefly switch to working on #380 since it feels like that will end up accomplishing the first half of this ticket in the process, after which I can switch back to this and look into how to have an action that also writes to postgres afterwards.

Also note for future: If an action that also writes to Postgres isn't possible, then the alternate solution may be to have separate calls for "run validation and write to postgres" vs "just write to postgres", where the UI calls the first, which wraps to a Go endpoint that runs validation and calls the second.

@terrcin
Copy link
Contributor

terrcin commented Feb 23, 2021

Also note for future: If an action that also writes to Postgres isn't possible, then the alternate solution may be to have separate calls for "run validation and write to postgres" vs "just write to postgres", where the UI calls the first, which wraps to a Go endpoint that runs validation and calls the second.

This wouldn't be a big deal as it'd be straight forward to abstract the calls away. Infact might be handy to have a validation only call in some places.

@nickbp
Copy link
Contributor Author

nickbp commented Feb 23, 2021

Oh that's a good point actually. Maybe it'd be better to just have a separate "validate" call entirely and keep the "write" calls as-is. That would allow the UI to run the validation periodically while a config is being edited, without actually committing the config to storage.

@MatApple
Copy link
Contributor

MatApple commented Feb 23, 2021

BTW, an interesting side effect of using Typescript for UI and apis is that validation code can also be shared. If only it were possible to use Go to create UIs :-)

@opstracy opstracy removed the in progress state (used by codetree) label Feb 23, 2021
@MatApple MatApple reopened this Feb 23, 2021
@nickbp nickbp added the in progress state (used by codetree) label Feb 25, 2021
nickbp pushed a commit that referenced this issue Feb 25, 2021
Need to:
- Implement Go action handlers in config-api (on separate optional hasura-only listen port)
- Create `hasura-action-secret` Secret in controller (copy `hasura-admin-secret` handling)

Signed-off-by: Nick Parker <nick@opstrace.com>
nickbp pushed a commit that referenced this issue Feb 25, 2021
Need to:
- Implement Go action handlers in config-api (on separate optional hasura-only listen port)
- Create `hasura-action-secret` Secret in controller (copy `hasura-admin-secret` handling)

Signed-off-by: Nick Parker <nick@opstrace.com>
nickbp pushed a commit that referenced this issue Mar 9, 2021
Need to:
- Implement Go action handlers in config-api (on separate optional hasura-only listen port)
- Create `hasura-action-secret` Secret in controller (copy `hasura-admin-secret` handling)

Signed-off-by: Nick Parker <nick@opstrace.com>
@nickbp nickbp changed the title Investigate use of Hasura Actions for validating Credential/Exporter configurations Implement Hasura Actions for validating Credential/Exporter configurations Mar 10, 2021
@nickbp nickbp changed the title Implement Hasura Actions for validating Credential/Exporter configurations Implement Hasura Actions for updating AlertManager configs and validating Credential/Exporter configurations Mar 10, 2021
@nickbp
Copy link
Contributor Author

nickbp commented Mar 10, 2021

Combining this with the (more urgent) support for getting/setting alertmanager configs, since they both touch the same code and need the same plumbing from hasura.

Had a meeting with Nahum to discuss how the GraphQL API would look. It turned out that Hasura Actions allow pretty free-form editing of the GraphQL structure. There is one limitation imposed by Hasura, that response types must be flat and cannot have nested types.

At this point, the status is:

  • The Hasura plumbing is complete and I'm successfully getting calls from Hasura, so the "high risk" integration part is complete. As a part of that Hasura and my endpoint are both configured with a common shared secret that's used to ensure that only Hasura itself can talk to the endpoint.
  • The Go code for the endpoint exposed to Hasura is implemented and compiles, but I haven't actually tried it yet. However it's just an HTTP endpoint/forwarder so it's not particularly complicated.
  • Update the code to reflect the new/better schema Nahum and I had worked through
  • Test that it works as expected when queried (can just test directly via Hasura console) and fix any issues in the Go code that come up
  • Rebase/create PR and fix any CI issues before merging to main

nickbp pushed a commit that referenced this issue Mar 10, 2021
…#373)

This should allow the UI to communicate with other things in the cluster, with Hasura/GraphQL acting as the common portal.

- Adds plumbing to Hasura for querying the config-api service, using a shared secret to authenticate Hasura to the service
- Implements calls for getting/setting the per-tenant alertmanager configuration in cortex
- Implements calls for validating credential/exporter configs

Signed-off-by: Nick Parker <nick@opstrace.com>
nickbp pushed a commit that referenced this issue Mar 10, 2021
…xporters (#373)

This should allow the UI to communicate with other things in the cluster, with Hasura/GraphQL acting as the common portal.

- Adds plumbing to Hasura for querying the config-api service, using a shared secret to authenticate Hasura to the service
- Implements calls for getting/setting the per-tenant alertmanager configuration in cortex
- Implements calls for validating credential/exporter configs

Signed-off-by: Nick Parker <nick@opstrace.com>
nickbp pushed a commit that referenced this issue Mar 10, 2021
…xporters (#373)

This should allow the UI to communicate with other things in the cluster, with Hasura/GraphQL acting as the common portal.

- Adds plumbing to Hasura for querying the config-api service, using a shared secret to authenticate Hasura to the service
- Implements calls for getting/setting the per-tenant alertmanager configuration in cortex
- Implements calls for validating credential/exporter configs

Signed-off-by: Nick Parker <nick@opstrace.com>
@nickbp
Copy link
Contributor Author

nickbp commented Mar 10, 2021

The code's been updated and I've been able to test with everything deployed into a test cluster. CI is happy as well.

It looks like things work as expected. i just made some improvements to the error passthrough. cortex's schema errors aren't great but it should be okay for now.

getting config that doesn't exist yet

query MyQuery {
  getAlertmanager(tenant_id: "dev") {
    config
    online
    tenant_id
  }
}

{
  "data": {
    "getAlertmanager": {
      "config": "",
      "online": true,
      "tenant_id": "dev"
    }
  }
}

setting invalid config

mutation MyMutation {
  updateAlertmanager(tenant_id: "dev", input: {config: "this is very yaml"}) {
    error_message
    error_raw_response
    success
    error_type
  }
}

{
  "data": {
    "updateAlertmanager": {
      "error_message": "Alertmanager config validation failed",
      "error_raw_response": "error marshalling YAML Alertmanager config: yaml: unmarshal errors:\n  line 1: cannot unmarshal !!str `this is...` into alertmanager.UserConfig\n",
      "success": false,
      "error_type": "VALIDATION_FAILED"
    }
  }
}

setting valid config

mutation MyMutation {
  updateAlertmanager(tenant_id: "dev", input: {config: "alertmanager_config: |\n  global:\n    smtp_smarthost: 'localhost:25'\n    smtp_from: 'youraddress@example.org'\n  route:\n    receiver: example-email\n  receivers:\n    - name: example-email\n      email_configs:\n      - to: 'youraddress@example.org'\n"}) {
    error_message
    error_raw_response
    success
    error_type
  }
}

{
  "data": {
    "updateAlertmanager": {
      "error_message": null,
      "error_raw_response": null,
      "success": true,
      "error_type": null
    }
  }
}

getting config that was just set (note: isn't exact string match, cortex seems to insert blank template_files at root)

query MyQuery {
  getAlertmanager(tenant_id: "dev") {
    config
    online
    tenant_id
  }
}

{
  "data": {
    "getAlertmanager": {
      "config": "template_files: {}\nalertmanager_config: |\n  global:\n    smtp_smarthost: 'localhost:25'\n    smtp_from: 'youraddress@example.org'\n  route:\n    receiver: example-email\n  receivers:\n    - name: example-email\n      email_configs:\n      - to: 'youraddress@example.org'\n",
      "online": true,
      "tenant_id": "dev"
    }
  }
}

nickbp pushed a commit that referenced this issue Mar 11, 2021
…xporters (#373) (#461)

* config: Implement Hasura Actions for alertmanager get/set and creds/exporters (#373)

This should allow the UI to communicate with other things in the cluster, with Hasura/GraphQL acting as the common portal.

- Adds plumbing to Hasura for querying the config-api service, using a shared secret to authenticate Hasura to the service
- Implements calls for getting/setting the per-tenant alertmanager configuration in cortex
- Implements calls for validating credential/exporter configs

Signed-off-by: Nick Parker <nick@opstrace.com>

* controller: Sync graphql schema, update graphql/app to reflect added Hasura Actions

Signed-off-by: Nick Parker <nick@opstrace.com>

* config: Pass-through HTTP body (containing original errors) on error response

Signed-off-by: Nick Parker <nick@opstrace.com>
@nickbp nickbp closed this as completed Mar 11, 2021
@nickbp nickbp removed the in progress state (used by codetree) label Mar 11, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Development

No branches or pull requests

5 participants