Implement Hasura Actions for updating AlertManager configs and validating Credential/Exporter configurations #373

nickbp · 2021-02-16T22:33:13Z

Had a meeting yesterday with @MatApple where he mentioned Hasura Actions: https://hasura.io/docs/1.0/graphql/core/actions/index.html

We might be able to add use them to embed validation at the point when data is submitted to GraphQL. For example, there's additional sanity checks for Credentials and Exporter configurations in the Go code - such as raising an error if the user tries to assign a GCP credential to a CloudWatch (AWS) exporter. But these sanity checks currently aren't run if the data is submitted directly to GraphQL, skipping the Go HTTP API. If we can embed them in the GraphQL layer, then the UI for example could reuse the same validation without needing to reimplement it in TypeScript.

Relates to: #310 (credentials/exporters epic), #314 (go API implementation issue)

nickbp · 2021-02-18T20:24:04Z

Something pointed out in the above issue is that it will likely be tricky for the UI to invoke the Go APIs, because they do not have access to the auth tokens that an end user is expected to use with curl commands. Given this, it'd be way easier for the UI to just talk to Hasura/graphql directly. Hence why we'd want Hasura to implement any config validation via actions if possible.

Figuring out whether this is feasible should be done soon to avoid blocking the UI on knowing what API to call. Assuming it works, the actual implementation of the validation isn't blocking, assuming that adding it would be transparent to the UI (and other graphql clients).

nickbp · 2021-02-23T04:52:35Z

It looks like this should be pretty straightforward to do and the Hasura docs specifically call out the idea of running validation before an insert/update is performed.

I still need to look a bit more into the following details:

Making sure that the Create[Credentials|Exporters] calls will work. I haven't seen any examples of an action that accepts an array of items, which is what Create[Credentials|Exporters] each expect. This should just be an issue of making sure the Action schema as defined by Hasura works. Should be fine, and I may have solved this already, but there's some risk and I won't know for sure until I've tried exercising the action.
Making sure that the insert or update is actually performed after the validation passes. The Hasura docs have been a bit contradictory on this. However there's an examples repo that may provide clear guidance on this.

Refs:

Docs describes actions that derive existing mutations, skips detail on how the mutation will still occur: https://hasura.io/docs/1.0/graphql/core/actions/derive.html
Separate doc around validation implies that the action needs to go back to hasura to do the insert? But then how do we avoid an infinite loop? This doesn't match up with other docs that make it sound like the insert/update would happen automatically: https://hasura.io/docs/1.0/graphql/core/schema/data-validations.html#using-hasura-actions
Examples repo containing self-contained examples of validating actions, can play around with these and check how the flow actually works: https://github.com/hasura/hasura-actions-examples/tree/master/data-validations

…373) Signed-off-by: Nick Parker <nick@opstrace.com>

nickbp · 2021-02-23T23:17:13Z

Just had a discussion with Nahum about doing a similar thing, using Hasura Actions fetching and storing alertmanager configs from the UI (#380). In that case, no follow-up calls to Postgres are needed, as the call would just be a wrapper around an existing Opstrace API. Given this it would be a good starting point to get the alertmanager calls working first, and then I can come back to the this credentials/exporters case where the data needs to also be written to Postgres.

In summary I'm going to briefly switch to working on #380 since it feels like that will end up accomplishing the first half of this ticket in the process, after which I can switch back to this and look into how to have an action that also writes to postgres afterwards.

Also note for future: If an action that also writes to Postgres isn't possible, then the alternate solution may be to have separate calls for "run validation and write to postgres" vs "just write to postgres", where the UI calls the first, which wraps to a Go endpoint that runs validation and calls the second.

terrcin · 2021-02-23T23:36:48Z

Also note for future: If an action that also writes to Postgres isn't possible, then the alternate solution may be to have separate calls for "run validation and write to postgres" vs "just write to postgres", where the UI calls the first, which wraps to a Go endpoint that runs validation and calls the second.

This wouldn't be a big deal as it'd be straight forward to abstract the calls away. Infact might be handy to have a validation only call in some places.

nickbp · 2021-02-23T23:38:25Z

Oh that's a good point actually. Maybe it'd be better to just have a separate "validate" call entirely and keep the "write" calls as-is. That would allow the UI to run the validation periodically while a config is being edited, without actually committing the config to storage.

MatApple · 2021-02-23T23:56:18Z

BTW, an interesting side effect of using Typescript for UI and apis is that validation code can also be shared. If only it were possible to use Go to create UIs :-)

Need to: - Implement Go action handlers in config-api (on separate optional hasura-only listen port) - Create `hasura-action-secret` Secret in controller (copy `hasura-admin-secret` handling) Signed-off-by: Nick Parker <nick@opstrace.com>

nickbp · 2021-03-10T01:19:45Z

Combining this with the (more urgent) support for getting/setting alertmanager configs, since they both touch the same code and need the same plumbing from hasura.

Had a meeting with Nahum to discuss how the GraphQL API would look. It turned out that Hasura Actions allow pretty free-form editing of the GraphQL structure. There is one limitation imposed by Hasura, that response types must be flat and cannot have nested types.

At this point, the status is:

The Hasura plumbing is complete and I'm successfully getting calls from Hasura, so the "high risk" integration part is complete. As a part of that Hasura and my endpoint are both configured with a common shared secret that's used to ensure that only Hasura itself can talk to the endpoint.
The Go code for the endpoint exposed to Hasura is implemented and compiles, but I haven't actually tried it yet. However it's just an HTTP endpoint/forwarder so it's not particularly complicated.
Update the code to reflect the new/better schema Nahum and I had worked through
Test that it works as expected when queried (can just test directly via Hasura console) and fix any issues in the Go code that come up
Rebase/create PR and fix any CI issues before merging to main

…#373) This should allow the UI to communicate with other things in the cluster, with Hasura/GraphQL acting as the common portal. - Adds plumbing to Hasura for querying the config-api service, using a shared secret to authenticate Hasura to the service - Implements calls for getting/setting the per-tenant alertmanager configuration in cortex - Implements calls for validating credential/exporter configs Signed-off-by: Nick Parker <nick@opstrace.com>

…xporters (#373) This should allow the UI to communicate with other things in the cluster, with Hasura/GraphQL acting as the common portal. - Adds plumbing to Hasura for querying the config-api service, using a shared secret to authenticate Hasura to the service - Implements calls for getting/setting the per-tenant alertmanager configuration in cortex - Implements calls for validating credential/exporter configs Signed-off-by: Nick Parker <nick@opstrace.com>

nickbp · 2021-03-10T23:44:31Z

The code's been updated and I've been able to test with everything deployed into a test cluster. CI is happy as well.

It looks like things work as expected. i just made some improvements to the error passthrough. cortex's schema errors aren't great but it should be okay for now.

getting config that doesn't exist yet

query MyQuery {
  getAlertmanager(tenant_id: "dev") {
    config
    online
    tenant_id
  }
}

{
  "data": {
    "getAlertmanager": {
      "config": "",
      "online": true,
      "tenant_id": "dev"
    }
  }
}

setting invalid config

mutation MyMutation {
  updateAlertmanager(tenant_id: "dev", input: {config: "this is very yaml"}) {
    error_message
    error_raw_response
    success
    error_type
  }
}

{
  "data": {
    "updateAlertmanager": {
      "error_message": "Alertmanager config validation failed",
      "error_raw_response": "error marshalling YAML Alertmanager config: yaml: unmarshal errors:\n  line 1: cannot unmarshal !!str `this is...` into alertmanager.UserConfig\n",
      "success": false,
      "error_type": "VALIDATION_FAILED"
    }
  }
}

setting valid config

mutation MyMutation {
  updateAlertmanager(tenant_id: "dev", input: {config: "alertmanager_config: |\n  global:\n    smtp_smarthost: 'localhost:25'\n    smtp_from: 'youraddress@example.org'\n  route:\n    receiver: example-email\n  receivers:\n    - name: example-email\n      email_configs:\n      - to: 'youraddress@example.org'\n"}) {
    error_message
    error_raw_response
    success
    error_type
  }
}

{
  "data": {
    "updateAlertmanager": {
      "error_message": null,
      "error_raw_response": null,
      "success": true,
      "error_type": null
    }
  }
}

getting config that was just set (note: isn't exact string match, cortex seems to insert blank template_files at root)

query MyQuery {
  getAlertmanager(tenant_id: "dev") {
    config
    online
    tenant_id
  }
}

{
  "data": {
    "getAlertmanager": {
      "config": "template_files: {}\nalertmanager_config: |\n  global:\n    smtp_smarthost: 'localhost:25'\n    smtp_from: 'youraddress@example.org'\n  route:\n    receiver: example-email\n  receivers:\n    - name: example-email\n      email_configs:\n      - to: 'youraddress@example.org'\n",
      "online": true,
      "tenant_id": "dev"
    }
  }
}

…xporters (#373) (#461) * config: Implement Hasura Actions for alertmanager get/set and creds/exporters (#373) This should allow the UI to communicate with other things in the cluster, with Hasura/GraphQL acting as the common portal. - Adds plumbing to Hasura for querying the config-api service, using a shared secret to authenticate Hasura to the service - Implements calls for getting/setting the per-tenant alertmanager configuration in cortex - Implements calls for validating credential/exporter configs Signed-off-by: Nick Parker <nick@opstrace.com> * controller: Sync graphql schema, update graphql/app to reflect added Hasura Actions Signed-off-by: Nick Parker <nick@opstrace.com> * config: Pass-through HTTP body (containing original errors) on error response Signed-off-by: Nick Parker <nick@opstrace.com>

nickbp added the backlog state (used by codetree) label Feb 16, 2021

nickbp added this to the Sprint: Week 06 (2021) milestone Feb 16, 2021

nickbp self-assigned this Feb 16, 2021

opstracy mentioned this issue Feb 16, 2021

Collect metrics and logs from CloudWatch and Stackdriver #310

Closed

triclambert modified the milestones: Sprint: Week 06 (2021), Sprint: Week 07 (2021) Feb 17, 2021

nickbp mentioned this issue Feb 18, 2021

Move alertmanager and credential/exporter config APIs into common /v1/ path under the root cluster domain #379

Closed

nickbp pushed a commit that referenced this issue Feb 23, 2021

wip: Implementing credential/exporter validation via Hasura Actions (#…

daf9343

…373) Signed-off-by: Nick Parker <nick@opstrace.com>

nickbp added in progress state (used by codetree) and removed backlog state (used by codetree) labels Feb 23, 2021

This was referenced Feb 23, 2021

Update backend to deploy cloudwatch/stackdriver into tenant namespace, get prometheus scraping working #312

Closed

UI: Create an interface that allows easy setup of the Cortex Alertmanager configuration #380

Closed

MatApple closed this as completed Feb 23, 2021

opstracy removed the in progress state (used by codetree) label Feb 23, 2021

MatApple reopened this Feb 23, 2021

nickbp added the in progress state (used by codetree) label Feb 25, 2021

nickbp changed the title ~~Investigate use of Hasura Actions for validating Credential/Exporter configurations~~ Implement Hasura Actions for validating Credential/Exporter configurations Mar 10, 2021

nickbp changed the title ~~Implement Hasura Actions for validating Credential/Exporter configurations~~ Implement Hasura Actions for updating AlertManager configs and validating Credential/Exporter configurations Mar 10, 2021

nickbp closed this as completed Mar 11, 2021

nickbp removed the in progress state (used by codetree) label Mar 11, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement Hasura Actions for updating AlertManager configs and validating Credential/Exporter configurations #373

Implement Hasura Actions for updating AlertManager configs and validating Credential/Exporter configurations #373

nickbp commented Feb 16, 2021

nickbp commented Feb 18, 2021

nickbp commented Feb 23, 2021 •

edited

nickbp commented Feb 23, 2021

terrcin commented Feb 23, 2021 •

edited

nickbp commented Feb 23, 2021

MatApple commented Feb 23, 2021 •

edited

nickbp commented Mar 10, 2021

nickbp commented Mar 10, 2021 •

edited

Implement Hasura Actions for updating AlertManager configs and validating Credential/Exporter configurations #373

Implement Hasura Actions for updating AlertManager configs and validating Credential/Exporter configurations #373

Comments

nickbp commented Feb 16, 2021

nickbp commented Feb 18, 2021

nickbp commented Feb 23, 2021 • edited

nickbp commented Feb 23, 2021

terrcin commented Feb 23, 2021 • edited

nickbp commented Feb 23, 2021

MatApple commented Feb 23, 2021 • edited

nickbp commented Mar 10, 2021

nickbp commented Mar 10, 2021 • edited

nickbp commented Feb 23, 2021 •

edited

terrcin commented Feb 23, 2021 •

edited

MatApple commented Feb 23, 2021 •

edited

nickbp commented Mar 10, 2021 •

edited