From 2729ab282b0b3f1832a522d7d8010a8443f32d70 Mon Sep 17 00:00:00 2001 From: Justin Field Date: Mon, 21 Oct 2019 15:25:35 -0700 Subject: [PATCH] chore(docs): add documentation on the object structure of canary config. (#622) * chore(docs): add documentation on the object structure of canary config. * chore(docs): Add id to the canary-config doc and add docs for New Relic Insights Metric Set Query Config. * fix(docs): Fix typo in URL Co-Authored-By: nisanharamati * fix(docs): Fix typo in URL Co-Authored-By: nisanharamati * fix(docs): Add anchor tag to link. Co-Authored-By: nisanharamati * chore: Update README.md Co-Authored-By: Chris Sanden --- README.md | 3 + docs/canary-config.md | 100 ++++++++++++++++++ .../docs/metric-set-query-config.md | 10 ++ .../NewRelicQueryBuilderServiceSpec.groovy | 8 +- kayenta-signalfx/README.md | 2 +- .../docs/metric-set-query-config.md | 31 ++++++ kayenta-signalfx/metric-query-config.md | 27 ----- 7 files changed, 149 insertions(+), 32 deletions(-) create mode 100644 docs/canary-config.md create mode 100644 kayenta-newrelic-insights/docs/metric-set-query-config.md create mode 100644 kayenta-signalfx/docs/metric-set-query-config.md delete mode 100644 kayenta-signalfx/metric-query-config.md diff --git a/README.md b/README.md index 3942a9bb9..514976a7c 100644 --- a/README.md +++ b/README.md @@ -12,6 +12,9 @@ The quality of the canary version is assessed by comparing key metrics that desc Canaries are usually run against deployments containing changes to code, but they can also be used for operational changes, including changes to configuration. +### Creating Canary Config +Checkout the [Canary Config Object model](docs/canary-config.md) for how a canary config is defined in [Markdown Syntax for Object Notation (MSON)](https://github.com/apiaryio/mson) + ### Debugging To start the JVM in debug mode, set the Java system property `DEBUG=true`: diff --git a/docs/canary-config.md b/docs/canary-config.md new file mode 100644 index 000000000..4285aafa2 --- /dev/null +++ b/docs/canary-config.md @@ -0,0 +1,100 @@ +Objects in this document are documented using [Markdown Syntax for Object Notation (MSON)]. + +### Canary Config Object model (object) + +#### Properties +- `id` **some-custom-id** (string, optional) - If not supplied a GUID will be generated for you. However you can supply a custom string here. The id is used when you call Kayenta to trigger canary execution, if you do not want to supply the config as part of the request. +- `name` **my-app golden signals canary config** (string, required) - Name for canary configuration. +- `description` **Canary config for my-app** (string, required) - Description for the canary configuration. +- `applications` (array[string], required) - A list of applications that the canary is for. You can just have a list with single item `ad-hoc` as the entry, unless you are storing the configuration in Kayenta and sharing it. +- `judge` ([CanaryJudgeConfig](#canary-judge-config), required) - Judge configuration. +- `metrics` (array([CanaryMetricConfig](#canary-metric-config))) - List of metrics to analyze. +- `templates` (map, optional) - Templates allow you to compose and parameterize advanced queries against your telemetry provider. Parameterized queries are hydrated by values provided in the canary stage. The project, resourceType, scope, and location variable bindings are implicitly available. For example, you can interpolate project using the following syntax: \${project}. +- `classifier` ([CanaryClassifierConfig](#canary-classifier-config), required) - The classification configuration, such as group weights. + + +### CanaryJudgeConfig (object) +Currently there is one judge and this object should be static across all the configuration (see the above examples). +#### Properties +- `name` **NetflixACAJudge-v1.0** (string, required) - Judge to use, as of right now there is only `NetflixACAJudge-v1.0`. +- `judgeConfigurations` **{}** (object, required) - Map of judgement configuration, this should always be an empty object as of right now. + + +### CanaryMetricConfig (object) +Describes a metric that will be used in determining the health of a canary. +#### Properties +- `name` **http errors** (string, required) - Human readable name of the metric under test. +- `query` (enum[[CanaryMetricSetQueryConfig](#canary-metrics-set-query-config)], required) - Query config object for your metric source type. +- `groups` (array[string], required) - List of metrics groups that this metric will belong to. +- `analysisConfigurations` ([AnalysisConfiguration](#analysis-configuration), required) - Analysis configuration, describes how to judge a given metric. +- `scopeName` (enum[string], required) + - `default` - only accepted value here + + +### CanaryMetricSetQueryConfig (object) +Metric source interface for describing how to query for a given metric / metric source. +#### Properties +- One of + - AtlasCanaryMetricSetQueryConfig + - DatadogCanaryMetricSetQueryConfig + - GraphiteCanaryMetricSetQueryConfig + - InfluxdbCanaryMetricSetQueryConfig + - [NewRelicInsightsCanaryMetricSetQueryConfig](../kayenta-newrelic-insights/docs/metric-set-query-config.md) + - PrometheusCanaryMetricSetQueryConfig + - [SignalFxCanaryMetricSetQueryConfig](../kayenta-signalfx/docs/metric-set-query-config.md) + - StackdriverCanaryMetricSetQueryConfig + - WavefrontCanaryMetricSetQueryConfig + + +### AnalysisConfiguration (object) +Wrapper object that includes the Canary Analysis Configuration and describes how to judge a given metric. +#### Properties +- `canary` ([CanaryAnalysisConfiguration](#canary-analysis-configuration)) + + +### CanaryAnalysisConfiguration (object) +Describes how to judge a metric, see the [Netflix Automated Canary Analysis Judge] for more information. +#### Properties +- `direction` (enum[string], required) - Which direction of statistical change triggers the metric to fail. + - `increase` - Use when you want the canary to fail only if it is significantly higher than the baseline (error counts, memory usage, etc, where a decrease is not a failure). + - `decrease` - Use when you want the canary to fail only if it is significantly lower than the baseline (success counts, etc, where a larger number is not a failure). + - `either` - Use when you want the canary to fail if it is significantly higher or lower than the baseline. +- `nanStrategy` (enum[string], required) - How to handle NaN values which can occur if the metric does not return data for a particular time interval. + - `remove` - Use when you expect a metric to always have data and you want the NaNs removed from your data set (usage metrics). + - `replace` - Use when you expect a metric to return no data in certain use cases and you want the NaNs replaced with zeros (for example: count metrics, if no errors happened, then metric will return no data for that time interval). +- `critical` **true** (boolean, optional) - Use to fail the entire canary if this metric fails (recommended for important metrics that signal service outages or severe problems). +- `mustHaveData` **true** (boolean, optional) - Use to fail a metric if data is missing. +- `effectSize` ([EffectSize](#effect-size), optional) - Controls how much different the metric needs to be to fail or fail critically. + + +### EffectSize +Controls the degree of statistical significance the metric needs to fail or fail critically. +Metrics marked as critical can also define `criticalIncrease` and `criticalDecrease`. +See the [Netflix Automated Canary Analysis Judge] and [Mann Whitney Classifier] classes for more information. + +#### Properties +- `allowedIncrease` **1.1** (number, optional) - Defaults to 1. The multiplier increase that must be met for the metric to fail. This example makes the metric fail when the metric has increased 10% from the baseline. +- `allowedDecrease` **0.90** (number, optional) - Defaults to 1. The multiplier decrease that must be met for the metric to fail. This example makes the metric fail when the metric has decreased 10% from the baseline. +- `criticalIncrease` **5.0** (number, optional) - Defaults to 1. The multiplier increase that must be met for the metric to be a critical failure and fail the entire analysis with a score of 0. This example make the canary fail critically if there is a 5x increase. +- `criticalDecrease` **0.5** (number, optional) - Defaults to 1. The multiplier decrease that must be met for the metric to be a critical failure and fail the entire analysis with a score of 0. This example make the canary fail critically if there is a 50% decrease. + + +### CanaryClassifierConfig +#### Properties +- `groupWeights` (enum[string], required) + - `groups` **"Latency" : 33** (object, required) - List of each metrics group along with its corresponding weight. Weights must total 100. + + +## Links +- [Spinnaker Canary Best Practices] +- [Canary analysis: Lessons learned and best practices from Google and Waze] +- [Armory Kayenta Documentation] +- [Example Signalfx canary config] + +[Spinnaker Canary Best Practices]: https://www.spinnaker.io/guides/user/canary/best-practices/ +[Armory Kayenta Documentation]: https://docs.armory.io/spinnaker/configure_kayenta/ +[Example Signalfx canary config]: https://github.com/spinnaker/kayenta/blob/master/kayenta-signalfx/metric-query-config.md +[Markdown Syntax for Object Notation (MSON)]: https://github.com/apiaryio/mson +[Canary analysis: Lessons learned and best practices from Google and Waze]: https://cloud.google.com/blog/products/devops-sre/canary-analysis-lessons-learned-and-best-practices-from-google-and-waze +[Netflix Automated Canary Analysis Judge]: https://github.com/spinnaker/kayenta/blob/master/kayenta-judge/src/main/scala/com/netflix/kayenta/judge/NetflixACAJudge.scala +[Mann Whitney Classifier]: https://github.com/spinnaker/kayenta/blob/master/kayenta-judge/src/main/scala/com/netflix/kayenta/judge/classifiers/metric/MannWhitneyClassifier.scala diff --git a/kayenta-newrelic-insights/docs/metric-set-query-config.md b/kayenta-newrelic-insights/docs/metric-set-query-config.md new file mode 100644 index 000000000..1e12a6661 --- /dev/null +++ b/kayenta-newrelic-insights/docs/metric-set-query-config.md @@ -0,0 +1,10 @@ +### NewRelicCanaryMetricSetQueryConfig (CanaryMetricSetQueryConfig) +New Relic Insights specific query configurations. +#### Properties +- `select` **SELECT count(\*) FROM Transaction** (string, optional) - NRQL query segment for WHERE clause. +- `q` **httpStatusCode LIKE '5%'** (string, optional) - The full select query component of the NRQL statement. See the [NRQL Docs](https://docs.newrelic.com/docs/query-data/nrql-new-relic-query-language/getting-started/nrql-syntax-components-functions) +- `customInlineTemplate` **SELECT count(\*) FROM Transaction TIMESERIES 60 seconds SINCE ${startEpochSeconds} UNTIL ${endEpochSeconds} WHERE httpStatusCode LIKE '5%' AND someKeyThatIsSetDuringDeployment LIKE '${someKeyThatWasProvidedInExtendedScopeParams}' AND autoScalingGroupName LIKE '${scope}' AND region LIKE '${location}'** (string, optional) - Custom inline template use this or `select` + `q`, this allows you to write your own NRQL, please note that your NRQL must use the TIMESERIES keyword. + +- `type` (enum[string], required) + - `newrelic` + diff --git a/kayenta-newrelic-insights/src/test/groovy/com/netflix/kayenta/newrelic/metrics/NewRelicQueryBuilderServiceSpec.groovy b/kayenta-newrelic-insights/src/test/groovy/com/netflix/kayenta/newrelic/metrics/NewRelicQueryBuilderServiceSpec.groovy index 8f22c85a9..1dc910faa 100644 --- a/kayenta-newrelic-insights/src/test/groovy/com/netflix/kayenta/newrelic/metrics/NewRelicQueryBuilderServiceSpec.groovy +++ b/kayenta-newrelic-insights/src/test/groovy/com/netflix/kayenta/newrelic/metrics/NewRelicQueryBuilderServiceSpec.groovy @@ -102,7 +102,7 @@ class NewRelicQueryBuilderServiceSpec extends Specification { "FROM Transaction " + "TIMESERIES 60 seconds " + "SINCE \${startEpochSeconds} UNTIL \${endEpochSeconds} " + - "WHERE httpStatusCode LIKE '5% " + + "WHERE httpStatusCode LIKE '5%' " + "AND autoScalingGroupName LIKE '\${scope}' " + "AND region LIKE '\${location}'", expectedQuery : @@ -110,7 +110,7 @@ class NewRelicQueryBuilderServiceSpec extends Specification { "FROM Transaction " + "TIMESERIES 60 seconds " + "SINCE ${start.epochSecond} UNTIL ${end.epochSecond} " + - "WHERE httpStatusCode LIKE '5% " + + "WHERE httpStatusCode LIKE '5%' " + "AND autoScalingGroupName LIKE 'myservice-prod-v01' " + "AND region LIKE 'us-west-2'" ], @@ -124,7 +124,7 @@ class NewRelicQueryBuilderServiceSpec extends Specification { "FROM Transaction " + "TIMESERIES 60 seconds " + "SINCE \${startEpochSeconds} UNTIL \${endEpochSeconds} " + - "WHERE httpStatusCode LIKE '5% " + + "WHERE httpStatusCode LIKE '5%' " + "AND someKeyThatIsSetDuringDeployment LIKE '\${someKey}' " + "AND autoScalingGroupName LIKE '\${scope}' " + "AND region LIKE '\${location}'", @@ -133,7 +133,7 @@ class NewRelicQueryBuilderServiceSpec extends Specification { "FROM Transaction " + "TIMESERIES 60 seconds " + "SINCE ${start.epochSecond} UNTIL ${end.epochSecond} " + - "WHERE httpStatusCode LIKE '5% " + + "WHERE httpStatusCode LIKE '5%' " + "AND someKeyThatIsSetDuringDeployment LIKE 'someValue' " + "AND autoScalingGroupName LIKE 'myservice-prod-v01' " + "AND region LIKE 'us-west-2'" diff --git a/kayenta-signalfx/README.md b/kayenta-signalfx/README.md index b3936a53a..986cd3e13 100644 --- a/kayenta-signalfx/README.md +++ b/kayenta-signalfx/README.md @@ -18,7 +18,7 @@ This module adds support to Kayenta to use SignalFx as a metric source. ``` ### Canary Config -See [The metric query config](metric-query-config.md) page. +See [canary-config](../docs/canary-config.md) and the [SignalFx metric query config](docs/metric-set-query-config.md) page. ## Development diff --git a/kayenta-signalfx/docs/metric-set-query-config.md b/kayenta-signalfx/docs/metric-set-query-config.md new file mode 100644 index 000000000..9aec08670 --- /dev/null +++ b/kayenta-signalfx/docs/metric-set-query-config.md @@ -0,0 +1,31 @@ +### SignalFxCanaryMetricSetQueryConfig (CanaryMetricSetQueryConfig) +SignalFx specific query configurations. +See [The integration test canary-config json](../src/integration-test/resources/integration-test-canary-config.json) for a real example. +#### Properties +- `metricName` **requests.count** (string, required) - Metric name. +- `queryPairs` (array[[QueryPair](#query-pairs)], optional) - List of query pairs. +- `aggregationMethod` (enum[string], optional) - How to aggregate each time series of collected data to a single data point. Defaults to mean. + - `bottom` + - `count` + - `max` + - `mean` + - `mean_plus_stddev` + - `median` + - `min` + - `random` + - `sample_stddev` + - `sample_variance` + - `size` + - `stddev` + - `sum` + - `top` + - `variance` +- `type` (enum[string], required) + - `signalfx` + + +### QueryPair (object) +Can be dimensions, properties, or tags (for tags, use tag as key). +#### Properties +- `key` **uri** (string, required) - key +- `value` **/v1/some-endpoint** - value diff --git a/kayenta-signalfx/metric-query-config.md b/kayenta-signalfx/metric-query-config.md deleted file mode 100644 index 733e4c1e3..000000000 --- a/kayenta-signalfx/metric-query-config.md +++ /dev/null @@ -1,27 +0,0 @@ -Example metric configuration for a canary config, in yaml for readability. - -See [The integration test canary-config json](src/integration-test/resources/integration-test-canary-config.json) for a real example. - -```yaml -name: Error Rate for /v1/some-endpoint -query: - metricName: kayenta.integration-test.internal-server-errors - queryPairs: # [Optional] Can be dimensions, properties, or tags (Use tag as key for tags). - - key: uri - value: /v1/some-endpoint - - key: status_code - value: "5*" - # Aggregate the N time series across each instance in a cluster to a single series, this gets used in the SignalFlow program - # Supported options are the stream method that support aggregation see: https://developers.signalfx.com/reference#signalflow-stream-methods-1 - aggregationMethod: sum # [Optional] Defaults to mean - serviceType: signalfx - type: signalfx -analysisConfigurations: - canary: - direction: increase - # Fail the canary if server errors increase. - critical: true -groups: -- Integration Test Group -scopeName: default -```