add metric push api for alerting client side errors #331

johnduffell · 2019-06-20T16:00:17Z

We have trouble at the moment that when things break on the client side, we don't know about it. We investigated using sentry but this was not possible. Now we have decided to have a fake img on the failure page that fetches this URL and we can cloudwatch alert on that.

Changes:
The most scary change is actually right at the bottom, I have updated the AWS lib version for everything!

TODO
DONE - deploy and test in CODE
DONE - check caching
DONE - check response content type
DONE - actually add alarm

add runscope test - EDIT: This is not possible without having two metrics and that seems less necessary due to reduced complexity
add alarm if the metric pushed by runscope has no data - EDIT see above

coveralls · 2019-06-20T16:11:41Z

Coverage remained the same at 61.403% when pulling 9237841 on metric-push-api into 0d74c31 on master.

twrichards

Are we sure we want this as a Scala lambda (given JVM warm-up time) - rather than say a Node/TypeScript one? (note that the aws-sdk is pretty nice)
Did you avoid Sentry because it requires JavaScript? If not, how come? (manage-frontend sends some custom error messages in certain scenarios and it picks them up and alerts nicely)

johnduffell · 2019-06-21T09:36:20Z

thanks tom, if it's slow then it's not a problem as it won't be blocking anything on the page it's basicallyfire and forget. I did/am considering API gateway->queue and then have a lambda batching off the queue, but I (hope) it won't be high enough volume for that to be an issue. IF we're getting that many errors then we have bigger concerns than too many executions.

I was originally planning sentry, but unfortunately sentry is per person rather than setting up "urgent 9-5" style team alerts. I think having consistency with the other alerts would be a plus point. Our sentry is not clean and I'm not sure someone has the motivation to be an advocate for it/keeping it clear and cared about. If someone will that's great but we just feel this will give more benefit for the effort with the current situation.

johnduffell · 2019-06-21T09:38:57Z

On that note though it would be good if we can have a similar setup for TS lambdas as we have here for scala ones, so that it's easy(er) to pop one in and share libraries where necessary. Either that or cross compile all these ones to ScalaJs 😜

jfsoul

This approach seems like the most straightforward way, and that is definitely a benefit. I've been trying to think if there's lower-footprint (but possibly over-smart) way like setting up a fastly service with logging that we only hit on errors and then alarm on some S3 metric of the bucket where we are logging 🙃

edit now I've thought more about alternatives:

no-op API gateway endpoint and set alarm on the Count metric for that API
no-op lambda (behind gateway) and set alarm on the Invocations metric for that lambda

handlers/metric-push-api/README.md

handlers/metric-push-api/build.sbt

handlers/metric-push-api/cfn.yaml

jfsoul · 2019-06-21T13:34:38Z

handlers/metric-push-api/cfn.yaml

+    StageMap:
+        CODE:
+            ApiName: metric-push-api-api-CODE
+            DomainName: metric-push-api-code.membership.guardianapis.com


is this the convention for these service lambdas? metric-push-api.code.dev-guardianapis.com might be nicer.

this is because of the route 53 zone we are using for these, if there's a code.dev-guardianapis.com one then maybe we can put it in there. Or we can add one. I'll have a look.

looking at https://console.aws.amazon.com/route53/home?region=eu-west-1#resource-record-sets:Z1E4V12LQGXFEC it does seem to be the convention. Whether it should be or not is another matter! it might even have been me that started the convention by mistake.

There is a *.support.guardianapis.com certificate and record set so I will at least change "membership" to "support".
To go further, it would have to be a separate cert (and record set) for *.code.dev-support.guardianapis.com, do you think it's worth doing that?

handlers/metric-push-api/cfn.yaml

jfsoul · 2019-06-21T14:06:34Z

handlers/metric-push-api/src/main/scala/com/gu/metric_push_api/Handler.scala

+    steps: MetricPushRequest => Try[Unit]
+  ): ApiGatewayRequest => ResponseModels.ApiResponse = req =>
+    (for {
+      wireUrlParams <- req.queryParamsAsCaseClass[WireUrlParams]()


I was thinking we could also pull out the headers and validate that the Referer header has an expected domain support.theguardian.com. The consequences of unexpected requests to this lambda seem fairly innocuous, especially if we limit the concurrent executions, but if the referrer check is simple then it wouldn't hurt.

I'm not checking this in the API gateway only version. I'm sure it's possible to check based on my reading the docs a lot, but I am reluctant to try! 😴

lib/handler/src/main/scala/com/gu/util/apigateway/ApiGatewayResponse.scala

johnduffell · 2019-06-24T12:04:34Z

@jfsoul with these settings and no lambda backing the API gateway:

we can get these logs, which seem sufficient to understand any browser issue, and we get per method count alerting.

If we want to add additional information we could add it into the request query string which is logged above.
The only down side is this is logged as a separate line from the headers, so if we use that functionality it will be hard to correlate with the browser headers just be searching.

jfsoul · 2019-06-24T13:11:37Z

Sounds good to me @johnduffell - if it's simpler, I say ship it!

johnduffell · 2019-06-24T15:51:15Z

ok, I (think) i've got the basic cloudformation working, I haven't deleted all the scala changes yet or fixed up the other comments, but I'm going to do some testing in CODE.

johnduffell · 2019-06-25T15:22:28Z

lib/handler/src/test/scala/com/gu/util/apigateway/ApiGatewayResponseTest.scala

@@ -0,0 +1,21 @@
+package com.gu.util.apigateway


I wrote two tests, one for the existing and one for my changes. But I thought even after deleting my changes it was worth keeping the other test. Does seem a bit out of place in this PR though.

johnduffell · 2019-06-25T15:26:00Z

project/Dependencies.scala

@@ -1,6 +1,8 @@
 import sbt._

 object Dependencies {
+
+  val awsVersion = "1.11.574"


I thought it was worth keeping this AWS update since I've noticed there is a new one. Not all the effectsTests pass, because of changes to dev salesforce, but I the s3 stuff is working, the SQS effectsTest still passes, and the SES stuff doesn't seem to be used (any more?)

jfsoul

Nice work!

johnduffell · 2019-06-26T11:34:19Z

I just realised that we can't add a runscope test as we only have one endpoint/metric now. I think with the reduced complexity maybe this isn't so necessary.

add metric push api for alerting client side errors

1ba316c

metric-push-api cloudwatch fixes to get it to run in CODE

0bb8b89

guardian deleted a comment Jun 20, 2019

add permission for the lambda to push a metric

9018deb

guardian deleted a comment Jun 21, 2019

twrichards reviewed Jun 21, 2019

View reviewed changes

change response to 204 No Content to save data and don't cache

08424a5

guardian deleted a comment Jun 21, 2019

jfsoul reviewed Jun 21, 2019

View reviewed changes

use API gateway with mock responses to log client side errors

61c142a

guardian deleted a comment Jun 24, 2019

mock api gateway backend needs the status code in the request body

a01506b

guardian deleted a comment Jun 25, 2019

remove unneeded refactoring from metrics lambda deletion

9237841

johnduffell commented Jun 25, 2019

View reviewed changes

guardian deleted a comment Jun 25, 2019

jfsoul approved these changes Jun 25, 2019

View reviewed changes

johnduffell merged commit ebc4425 into master Jun 26, 2019

johnduffell deleted the metric-push-api branch June 26, 2019 11:31

johnduffell changed the title ~~WIP: add metric push api for alerting client side errors~~ add metric push api for alerting client side errors Jun 26, 2019

johnduffell mentioned this pull request Jun 28, 2019

push a GET to cloudwatch if a client side error happens guardian/support-frontend#1932

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add metric push api for alerting client side errors #331

add metric push api for alerting client side errors #331

johnduffell commented Jun 20, 2019 •

edited

coveralls commented Jun 20, 2019 •

edited

twrichards left a comment

johnduffell commented Jun 21, 2019

johnduffell commented Jun 21, 2019

jfsoul left a comment •

edited

jfsoul Jun 21, 2019

johnduffell Jun 21, 2019

johnduffell Jun 25, 2019

jfsoul Jun 21, 2019

johnduffell Jun 25, 2019

johnduffell commented Jun 24, 2019

jfsoul commented Jun 24, 2019

johnduffell commented Jun 24, 2019

johnduffell Jun 25, 2019 •

edited

johnduffell Jun 25, 2019

jfsoul left a comment

johnduffell commented Jun 26, 2019

add metric push api for alerting client side errors #331

add metric push api for alerting client side errors #331

Conversation

johnduffell commented Jun 20, 2019 • edited

coveralls commented Jun 20, 2019 • edited

twrichards left a comment

Choose a reason for hiding this comment

johnduffell commented Jun 21, 2019

johnduffell commented Jun 21, 2019

jfsoul left a comment • edited

Choose a reason for hiding this comment

jfsoul Jun 21, 2019

Choose a reason for hiding this comment

johnduffell Jun 21, 2019

Choose a reason for hiding this comment

johnduffell Jun 25, 2019

Choose a reason for hiding this comment

jfsoul Jun 21, 2019

Choose a reason for hiding this comment

johnduffell Jun 25, 2019

Choose a reason for hiding this comment

johnduffell commented Jun 24, 2019

jfsoul commented Jun 24, 2019

johnduffell commented Jun 24, 2019

johnduffell Jun 25, 2019 • edited

Choose a reason for hiding this comment

johnduffell Jun 25, 2019

Choose a reason for hiding this comment

jfsoul left a comment

Choose a reason for hiding this comment

johnduffell commented Jun 26, 2019

johnduffell commented Jun 20, 2019 •

edited

coveralls commented Jun 20, 2019 •

edited

jfsoul left a comment •

edited

johnduffell Jun 25, 2019 •

edited