Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implements DatadogAPM plugin #241

Merged
merged 8 commits into from
Aug 27, 2020
Merged

Implements DatadogAPM plugin #241

merged 8 commits into from
Aug 27, 2020

Conversation

urjitbhatia
Copy link
Contributor

@urjitbhatia urjitbhatia commented Aug 23, 2020

This PR implements the datadog plugin #60

I was able to successfully test this on our setup.
The check configuration looks like this:

check "dd_response_time" {
  source = "datadog"
  query  = "FROM=2m;TO=0m;QUERY=avg:proxy.backend.response.time{proxy-service:web-app}"
  strategy "target-value" {
    target = 18
  }
}

Since datadog metrics queries require a FROM and TO time, I am using a simple; delimited query string for dynamically choosing those values per check. Happy to take suggestions on that.

APM config looks like this:

apm "datadog" {
  driver = "datadog"
  config = {
    dd_api_key = "<api key>"
    dd_app_key = "<app key>"
  }
}

@hashicorp-cla
Copy link

hashicorp-cla commented Aug 23, 2020

CLA assistant check
All committers have signed the CLA.

Copy link
Member

@jrasell jrasell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @urjitbhatia and thanks so much for this! I have no doubt this will get a lot of use right out of the gate.

The query syntax makes sense to me at first glance. I'll let other team members take a look once they are online to see if they have any additional thoughts or comments.

One overall change that might be nice is to split out the query parsing into its own function. This would then make it easier to adds tests to the parsing of user supplied queries. It my also be advantageous to define the query parts as constants to avoid the duplication.

plugins/builtin/apm/datadog/plugin/plugin.go Outdated Show resolved Hide resolved
plugins/builtin/apm/datadog/plugin/plugin.go Outdated Show resolved Hide resolved
plugins/builtin/apm/datadog/plugin/plugin.go Outdated Show resolved Hide resolved
Copy link
Contributor

@jippi jippi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my 50c on this (great!) PR is that the query syntax is not super intuitive.

Would it be possible to have TO/FROM being its own HCL config key/value pairs (like task -> config{}) that can be validated to be correct format on submission, easier to understand and allow the query value to identical to the value in the datadog explorer/monitoring UI, for easy copy+paste and fewer errors

plugins/builtin/apm/datadog/plugin/plugin.go Outdated Show resolved Hide resolved
ctx, cancel := context.WithTimeout(a.clientCtx, 10*time.Second)
defer cancel()

queryResult, _, err := a.client.MetricsApi.QueryMetrics(ctx).From(from).To(to).Query(query).Execute()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whats the error behavior for rate-limiting? I feel like those should have special handling and be very visible (ideally a time series you can alert on)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jippi I added a message with some rate limit info on a 429 https://github.com/hashicorp/nomad-autoscaler/pull/241/files#diff-40837c4aaa38b09f14082343a9fc46edR135

Lmk if you want a more elaborate solution there? Like holding a state with number of remaining requests etc and skipping making requests if we're over the quota in the current time period.

Not super sure I understand what you mean by (ideally a time series you can alert on).

For reference: https://docs.datadoghq.com/api/v1/#rate-limiting

@urjitbhatia
Copy link
Contributor Author

my 50c on this (great!) PR is that the query syntax is not super intuitive.

Would it be possible to have TO/FROM being its own HCL config key/value pairs (like task -> config{}) that can be validated to be correct format on submission, easier to understand and allow the query value to identical to the value in the datadog explorer/monitoring UI, for easy copy+paste and fewer errors

@jippi I'd really like that too but from what I understand, the plugin interface does not send any custom config values pulled from task -> config because that is controlled by nomad?
How about adding a config map to https://github.com/hashicorp/nomad-autoscaler/blob/master/policy/policy.go#L25 and then we can have something like:

check "dd_response_time" {
  source = "datadog"
  config {
     from = "2m"
     to      = "0m"
  }
  query  = "avg:proxy.backend.response.time{proxy-service:web-app}"
  strategy "target-value" {
    target = 18
  }
}

@lgfa29
Copy link
Contributor

lgfa29 commented Aug 24, 2020

my 50c on this (great!) PR is that the query syntax is not super intuitive.
Would it be possible to have TO/FROM being its own HCL config key/value pairs (like task -> config{}) that can be validated to be correct format on submission, easier to understand and allow the query value to identical to the value in the datadog explorer/monitoring UI, for easy copy+paste and fewer errors

@jippi I'd really like that too but from what I understand, the plugin interface does not send any custom config values pulled from task -> config because that is controlled by nomad?
How about adding a config map to https://github.com/hashicorp/nomad-autoscaler/blob/master/policy/policy.go#L25 and then we can have something like:

check "dd_response_time" {
  source = "datadog"
  config {
     from = "2m"
     to      = "0m"
  }
  query  = "avg:proxy.backend.response.time{proxy-service:web-app}"
  strategy "target-value" {
    target = 18
  }
}

That's right @urjitbhatia, our API for APMs is quite limited right now, but we have plans to improve it soon and allow for Query to receive a time range. So I think it's OK for you to hold off on making changes for now.

@urjitbhatia
Copy link
Contributor Author

@lgfa29 just wanted to double check if adding a config map to https://github.com/hashicorp/nomad-autoscaler/blob/master/policy/policy.go#L25 is possible.
nomad validate doesn't complain if I add this map in there and that'll make the config cleaner?

@lgfa29
Copy link
Contributor

lgfa29 commented Aug 24, 2020

@lgfa29 just wanted to double check if adding a config map to https://github.com/hashicorp/nomad-autoscaler/blob/master/policy/policy.go#L25 is possible.
nomad validate doesn't complain if I add this map in there and that'll make the config cleaner?

Yes, it's possible. Nomad ignores everything inside the policy stanza since it doesn't know how the Autoscaler uses it.

What I meant in my previous message is that we have some changes coming to the APM plugins interface, but we are not 100% sure how they will look like yet. So it's OK to leave the query structure as it is 👍

@jippi
Copy link
Contributor

jippi commented Aug 25, 2020

@lgfa29 @urjitbhatia Sorry it wasn't clear, I meant to have a config{} stanza inside the check{} stanza similar to task->config{} - so your pseudo example is exactly what I meant.

To me, that would be a fantastic interface for managing these settings

urjitbhatia and others added 2 commits August 25, 2020 12:37
DD_CLIENT_APP_KEY -> DD_APP_KEY
DD_CLIENT_API_KEY -> DD_API_KEY

Co-authored-by: James Rasell <jrasell@users.noreply.github.com>
@urjitbhatia
Copy link
Contributor Author

@lgfa29 @urjitbhatia Sorry it wasn't clear, I meant to have a config{} stanza inside the check{} stanza similar to task->config{} - so your pseudo example is exactly what I meant.

To me, that would be a fantastic interface for managing these settings

Yep, that sounds great! However @lgfa29 is suggesting leaving it as is for now till newer interface is planned. Happy to go whichever way as you guys decide :)

@lgfa29
Copy link
Contributor

lgfa29 commented Aug 26, 2020

That's right @urjitbhatia, let's leave the query as it is for now.

@jippi your suggestion makes a lot of sense and presents a better UX, but since the interface will change I think it's better to merge this now and update the query syntax later, once we have the new interface for APM plugins defined.

@jippi
Copy link
Contributor

jippi commented Aug 26, 2020

so many approvals, so little merging :D go go go 🥳

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants