chore(docs): add documentation on the object structure of canary conf…

…ig. (#622) * chore(docs): add documentation on the object structure of canary config. * chore(docs): Add id to the canary-config doc and add docs for New Relic Insights Metric Set Query Config. * fix(docs): Fix typo in URL Co-Authored-By: nisanharamati <hanisan@gmail.com> * fix(docs): Fix typo in URL Co-Authored-By: nisanharamati <hanisan@gmail.com> * fix(docs): Add anchor tag to link. Co-Authored-By: nisanharamati <hanisan@gmail.com> * chore: Update README.md Co-Authored-By: Chris Sanden <chris.sanden@gmail.com>
spinnaker · Oct 21, 2019 · 2729ab2 · 2729ab2
1 parent 08b119a
commit 2729ab2
Show file tree

Hide file tree

Showing 7 changed files with 149 additions and 32 deletions.
diff --git a/README.md b/README.md
@@ -12,6 +12,9 @@ The quality of the canary version is assessed by comparing key metrics that desc
 Canaries are usually run against deployments containing changes to code, but they
 can also be used for operational changes, including changes to configuration.
 
+### Creating Canary Config
+Checkout the [Canary Config Object model](docs/canary-config.md) for how a canary config is defined in [Markdown Syntax for Object Notation (MSON)](https://github.com/apiaryio/mson)
+
 ### Debugging
 
 To start the JVM in debug mode, set the Java system property `DEBUG=true`:

diff --git a/docs/canary-config.md b/docs/canary-config.md
@@ -0,0 +1,100 @@
+Objects in this document are documented using [Markdown Syntax for Object Notation (MSON)].
+
+### Canary Config Object model (object)
+
+#### Properties
+- `id` **some-custom-id** (string, optional) - If not supplied a GUID will be generated for you. However you can supply a custom string here. The id is used when you call Kayenta to trigger canary execution, if you do not want to supply the config as part of the request.
+- `name` **my-app golden signals canary config** (string, required) - Name for canary configuration.
+- `description` **Canary config for my-app** (string, required) - Description for the canary configuration.
+- `applications` (array[string], required) - A list of applications that the canary is for. You can just have a list with single item `ad-hoc` as the entry, unless you are storing the configuration in Kayenta and sharing it. 
+- `judge` ([CanaryJudgeConfig](#canary-judge-config), required) - Judge configuration.
+- `metrics` (array([CanaryMetricConfig](#canary-metric-config))) - List of metrics to analyze.
+- `templates` (map<string, string>, optional) - Templates allow you to compose and parameterize advanced queries against your telemetry provider. Parameterized queries are hydrated by values provided in the canary stage. The <strong>project</strong>, <strong>resourceType</strong>, </string><strong>scope</strong>, and <strong>location</strong> variable bindings are implicitly available. For example, you can interpolate <strong>project</strong> using the following syntax: <strong>\${project}</strong>.
+- `classifier` ([CanaryClassifierConfig](#canary-classifier-config), required) - The classification configuration, such as group weights.
+
+<a name="canary-judge-config"></a>
+### CanaryJudgeConfig (object)
+Currently there is one judge and this object should be static across all the configuration (see the above examples).
+#### Properties
+- `name` **NetflixACAJudge-v1.0** (string, required) - Judge to use, as of right now there is only `NetflixACAJudge-v1.0`.
+- `judgeConfigurations` **{}** (object, required) - Map<string, object> of judgement configuration, this should always be an empty object as of right now.
+
+<a name="canary-metric-config"></a>
+### CanaryMetricConfig (object)
+Describes a metric that will be used in determining the health of a canary.
+#### Properties
+- `name` **http errors** (string, required) - Human readable name of the metric under test.
+- `query` (enum[[CanaryMetricSetQueryConfig](#canary-metrics-set-query-config)], required) - Query config object for your metric source type.
+- `groups` (array[string], required) - List of metrics groups that this metric will belong to.
+- `analysisConfigurations` ([AnalysisConfiguration](#analysis-configuration), required) - Analysis configuration, describes how to judge a given metric.
+- `scopeName` (enum[string], required)
+    - `default` - only accepted value here
+
+<a name="canary-metrics-set-query-config"></a>
+### CanaryMetricSetQueryConfig (object)
+Metric source interface for describing how to query for a given metric / metric source.
+#### Properties
+- One of
+    - AtlasCanaryMetricSetQueryConfig
+    - DatadogCanaryMetricSetQueryConfig
+    - GraphiteCanaryMetricSetQueryConfig
+    - InfluxdbCanaryMetricSetQueryConfig
+    - [NewRelicInsightsCanaryMetricSetQueryConfig](../kayenta-newrelic-insights/docs/metric-set-query-config.md)
+    - PrometheusCanaryMetricSetQueryConfig
+    - [SignalFxCanaryMetricSetQueryConfig](../kayenta-signalfx/docs/metric-set-query-config.md)
+    - StackdriverCanaryMetricSetQueryConfig
+    - WavefrontCanaryMetricSetQueryConfig
+
+<a name="analysis-configuration"></a>
+### AnalysisConfiguration (object)
+Wrapper object that includes the Canary Analysis Configuration and describes how to judge a given metric.
+#### Properties
+- `canary` ([CanaryAnalysisConfiguration](#canary-analysis-configuration))
+
+<a name="canary-analysis-configuration"></a>
+### CanaryAnalysisConfiguration (object)
+Describes how to judge a metric, see the [Netflix Automated Canary Analysis Judge] for more information.
+#### Properties
+- `direction` (enum[string], required) - Which direction of statistical change triggers the metric to fail.
+    - `increase` - Use when you want the canary to fail only if it is significantly higher than the baseline (error counts, memory usage, etc, where a decrease is not a failure).
+    - `decrease` - Use when you want the canary to fail only if it is significantly lower than the baseline (success counts, etc, where a larger number is not a failure).
+    - `either` - Use when you want the canary to fail if it is significantly higher or lower than the baseline.
+- `nanStrategy` (enum[string], required) - How to handle NaN values which can occur if the metric does not return data for a particular time interval.
+    - `remove` - Use when you expect a metric to always have data and you want the NaNs removed from your data set (usage metrics).
+    - `replace` - Use when you expect a metric to return no data in certain use cases and you want the NaNs replaced with zeros (for example: count metrics, if no errors happened, then metric will return no data for that time interval).
+- `critical` **true** (boolean, optional) - Use to fail the entire canary if this metric fails (recommended for important metrics that signal service outages or severe problems).
+- `mustHaveData` **true** (boolean, optional) - Use to fail a metric if data is missing.
+- `effectSize` ([EffectSize](#effect-size), optional) - Controls how much different the metric needs to be to fail or fail critically.
+
+<a name="effect-size"></a>
+### EffectSize
+Controls the degree of statistical significance the metric needs to fail or fail critically. 
+Metrics marked as critical can also define `criticalIncrease` and `criticalDecrease`. 
+See the [Netflix Automated Canary Analysis Judge] and [Mann Whitney Classifier] classes for more information.
+
+#### Properties
+- `allowedIncrease` **1.1** (number, optional) - Defaults to 1. The multiplier increase that must be met for the metric to fail. This example makes the metric fail when the metric has increased 10% from the baseline.
+- `allowedDecrease` **0.90** (number, optional) - Defaults to 1. The multiplier decrease that must be met for the metric to fail. This example makes the metric fail when the metric has decreased 10% from the baseline.
+- `criticalIncrease` **5.0** (number, optional) - Defaults to 1. The multiplier increase that must be met for the metric to be a critical failure and fail the entire analysis with a score of 0. This example make the canary fail critically if there is a 5x increase.
+- `criticalDecrease` **0.5** (number, optional) - Defaults to 1. The multiplier decrease that must be met for the metric to be a critical failure and fail the entire analysis with a score of 0. This example make the canary fail critically if there is a 50% decrease.
+
+<a name="canary-classifier-config"></a>
+### CanaryClassifierConfig
+#### Properties
+- `groupWeights` (enum[string], required)
+  - `groups` **"Latency" : 33** (object, required) - List of each metrics group along with its corresponding weight. Weights must total 100.
+
+<a name="links"></a>
+## Links
+- [Spinnaker Canary Best Practices]
+- [Canary analysis: Lessons learned and best practices from Google and Waze]
+- [Armory Kayenta Documentation]
+- [Example Signalfx canary config]
+
+[Spinnaker Canary Best Practices]: https://www.spinnaker.io/guides/user/canary/best-practices/
+[Armory Kayenta Documentation]: https://docs.armory.io/spinnaker/configure_kayenta/
+[Example Signalfx canary config]: https://github.com/spinnaker/kayenta/blob/master/kayenta-signalfx/metric-query-config.md
+[Markdown Syntax for Object Notation (MSON)]: https://github.com/apiaryio/mson
+[Canary analysis: Lessons learned and best practices from Google and Waze]: https://cloud.google.com/blog/products/devops-sre/canary-analysis-lessons-learned-and-best-practices-from-google-and-waze
+[Netflix Automated Canary Analysis Judge]: https://github.com/spinnaker/kayenta/blob/master/kayenta-judge/src/main/scala/com/netflix/kayenta/judge/NetflixACAJudge.scala
+[Mann Whitney Classifier]: https://github.com/spinnaker/kayenta/blob/master/kayenta-judge/src/main/scala/com/netflix/kayenta/judge/classifiers/metric/MannWhitneyClassifier.scala
diff --git a/kayenta-newrelic-insights/docs/metric-set-query-config.md b/kayenta-newrelic-insights/docs/metric-set-query-config.md
@@ -0,0 +1,10 @@
+### NewRelicCanaryMetricSetQueryConfig (CanaryMetricSetQueryConfig)
+New Relic Insights specific query configurations.
+#### Properties
+- `select` **SELECT count(\*) FROM Transaction** (string, optional) - NRQL query segment for WHERE clause.
+- `q` **httpStatusCode LIKE '5%'** (string, optional) - The full select query component of the NRQL statement. See the [NRQL Docs](https://docs.newrelic.com/docs/query-data/nrql-new-relic-query-language/getting-started/nrql-syntax-components-functions)
+- `customInlineTemplate` **SELECT count(\*) FROM Transaction TIMESERIES 60 seconds SINCE ${startEpochSeconds} UNTIL ${endEpochSeconds} WHERE httpStatusCode LIKE '5%' AND someKeyThatIsSetDuringDeployment LIKE '${someKeyThatWasProvidedInExtendedScopeParams}' AND autoScalingGroupName LIKE '${scope}' AND region LIKE '${location}'** (string, optional) - Custom inline template use this or `select` + `q`, this allows you to write your own NRQL, please note that your NRQL must use the TIMESERIES keyword.
+<!-- - `customFilterTemplate` (string, optional) **todo** // Need to consult with @duftler on how this works -->
+- `type` (enum[string], required)
+    - `newrelic`
+
diff --git a/...c/test/groovy/com/netflix/kayenta/newrelic/metrics/NewRelicQueryBuilderServiceSpec.groovy b/...c/test/groovy/com/netflix/kayenta/newrelic/metrics/NewRelicQueryBuilderServiceSpec.groovy
@@ -102,15 +102,15 @@ class NewRelicQueryBuilderServiceSpec extends Specification {
           "FROM Transaction " +
           "TIMESERIES 60 seconds " +
           "SINCE \${startEpochSeconds} UNTIL \${endEpochSeconds} " +
-          "WHERE httpStatusCode LIKE '5% " +
+          "WHERE httpStatusCode LIKE '5%' " +
           "AND autoScalingGroupName LIKE '\${scope}' " +
           "AND region LIKE '\${location}'",
       expectedQuery       :
         "SELECT count(*) " +
           "FROM Transaction " +
           "TIMESERIES 60 seconds " +
           "SINCE ${start.epochSecond} UNTIL ${end.epochSecond} " +
-          "WHERE httpStatusCode LIKE '5% " +
+          "WHERE httpStatusCode LIKE '5%' " +
           "AND autoScalingGroupName LIKE 'myservice-prod-v01' " +
           "AND region LIKE 'us-west-2'"
     ],
@@ -124,7 +124,7 @@ class NewRelicQueryBuilderServiceSpec extends Specification {
           "FROM Transaction " +
           "TIMESERIES 60 seconds " +
           "SINCE \${startEpochSeconds} UNTIL \${endEpochSeconds} " +
-          "WHERE httpStatusCode LIKE '5% " +
+          "WHERE httpStatusCode LIKE '5%' " +
           "AND someKeyThatIsSetDuringDeployment LIKE '\${someKey}' " +
           "AND autoScalingGroupName LIKE '\${scope}' " +
           "AND region LIKE '\${location}'",
@@ -133,7 +133,7 @@ class NewRelicQueryBuilderServiceSpec extends Specification {
           "FROM Transaction " +
           "TIMESERIES 60 seconds " +
           "SINCE ${start.epochSecond} UNTIL ${end.epochSecond} " +
-          "WHERE httpStatusCode LIKE '5% " +
+          "WHERE httpStatusCode LIKE '5%' " +
           "AND someKeyThatIsSetDuringDeployment LIKE 'someValue' " +
           "AND autoScalingGroupName LIKE 'myservice-prod-v01' " +
           "AND region LIKE 'us-west-2'"

diff --git a/kayenta-signalfx/README.md b/kayenta-signalfx/README.md
@@ -18,7 +18,7 @@ This module adds support to Kayenta to use SignalFx as a metric source.
 ```
 
 ### Canary Config
-See [The metric query config](metric-query-config.md) page.
+See [canary-config](../docs/canary-config.md) and the [SignalFx metric query config](docs/metric-set-query-config.md) page.
 
 ## Development
 

diff --git a/kayenta-signalfx/docs/metric-set-query-config.md b/kayenta-signalfx/docs/metric-set-query-config.md
@@ -0,0 +1,31 @@
+### SignalFxCanaryMetricSetQueryConfig (CanaryMetricSetQueryConfig)
+SignalFx specific query configurations.
+See [The integration test canary-config json](../src/integration-test/resources/integration-test-canary-config.json) for a real example.
+#### Properties
+- `metricName` **requests.count** (string, required) - Metric name.
+- `queryPairs` (array[[QueryPair](#query-pairs)], optional) - List of query pairs. 
+- `aggregationMethod` (enum[string], optional) - How to aggregate each time series of collected data to a single data point. Defaults to mean.
+  - `bottom`
+  - `count`
+  - `max`
+  - `mean`
+  - `mean_plus_stddev`
+  - `median`
+  - `min`
+  - `random`
+  - `sample_stddev`
+  - `sample_variance`
+  - `size`
+  - `stddev`
+  - `sum`
+  - `top`
+  - `variance`
+- `type` (enum[string], required)
+    - `signalfx`
+
+<a name="query-pairs"></a>
+### QueryPair (object)
+Can be dimensions, properties, or tags (for tags, use tag as key).
+#### Properties
+- `key` **uri** (string, required) - key
+- `value` **/v1/some-endpoint** - value
diff --git a/kayenta-signalfx/metric-query-config.md b/kayenta-signalfx/metric-query-config.md