Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add metric push api for alerting client side errors #331

Merged
merged 7 commits into from Jun 26, 2019

Conversation

johnduffell
Copy link
Member

@johnduffell johnduffell commented Jun 20, 2019

We have trouble at the moment that when things break on the client side, we don't know about it. We investigated using sentry but this was not possible. Now we have decided to have a fake img on the failure page that fetches this URL and we can cloudwatch alert on that.

Changes:
The most scary change is actually right at the bottom, I have updated the AWS lib version for everything!

TODO
DONE - deploy and test in CODE
DONE - check caching
DONE - check response content type
DONE - actually add alarm

  • add runscope test - EDIT: This is not possible without having two metrics and that seems less necessary due to reduced complexity
  • add alarm if the metric pushed by runscope has no data - EDIT see above

@coveralls
Copy link

coveralls commented Jun 20, 2019

Coverage Status

Coverage remained the same at 61.403% when pulling 9237841 on metric-push-api into 0d74c31 on master.

@guardian guardian deleted a comment Jun 20, 2019
@guardian guardian deleted a comment Jun 21, 2019
Copy link
Contributor

@twrichards twrichards left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Are we sure we want this as a Scala lambda (given JVM warm-up time) - rather than say a Node/TypeScript one? (note that the aws-sdk is pretty nice)
  2. Did you avoid Sentry because it requires JavaScript? If not, how come? (manage-frontend sends some custom error messages in certain scenarios and it picks them up and alerts nicely)

@johnduffell
Copy link
Member Author

thanks tom, if it's slow then it's not a problem as it won't be blocking anything on the page it's basicallyfire and forget. I did/am considering API gateway->queue and then have a lambda batching off the queue, but I (hope) it won't be high enough volume for that to be an issue. IF we're getting that many errors then we have bigger concerns than too many executions.

I was originally planning sentry, but unfortunately sentry is per person rather than setting up "urgent 9-5" style team alerts. I think having consistency with the other alerts would be a plus point. Our sentry is not clean and I'm not sure someone has the motivation to be an advocate for it/keeping it clear and cared about. If someone will that's great but we just feel this will give more benefit for the effort with the current situation.

@johnduffell
Copy link
Member Author

On that note though it would be good if we can have a similar setup for TS lambdas as we have here for scala ones, so that it's easy(er) to pop one in and share libraries where necessary. Either that or cross compile all these ones to ScalaJs 😜

@guardian guardian deleted a comment Jun 21, 2019
Copy link
Contributor

@jfsoul jfsoul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This approach seems like the most straightforward way, and that is definitely a benefit. I've been trying to think if there's lower-footprint (but possibly over-smart) way like setting up a fastly service with logging that we only hit on errors and then alarm on some S3 metric of the bucket where we are logging 🙃

edit now I've thought more about alternatives:

  • no-op API gateway endpoint and set alarm on the Count metric for that API
  • no-op lambda (behind gateway) and set alarm on the Invocations metric for that lambda

handlers/metric-push-api/README.md Show resolved Hide resolved
handlers/metric-push-api/build.sbt Outdated Show resolved Hide resolved
handlers/metric-push-api/cfn.yaml Outdated Show resolved Hide resolved
StageMap:
CODE:
ApiName: metric-push-api-api-CODE
DomainName: metric-push-api-code.membership.guardianapis.com
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this the convention for these service lambdas? metric-push-api.code.dev-guardianapis.com might be nicer.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is because of the route 53 zone we are using for these, if there's a code.dev-guardianapis.com one then maybe we can put it in there. Or we can add one. I'll have a look.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looking at https://console.aws.amazon.com/route53/home?region=eu-west-1#resource-record-sets:Z1E4V12LQGXFEC it does seem to be the convention. Whether it should be or not is another matter! it might even have been me that started the convention by mistake.

There is a *.support.guardianapis.com certificate and record set so I will at least change "membership" to "support".
To go further, it would have to be a separate cert (and record set) for *.code.dev-support.guardianapis.com, do you think it's worth doing that?

handlers/metric-push-api/cfn.yaml Outdated Show resolved Hide resolved
handlers/metric-push-api/cfn.yaml Outdated Show resolved Hide resolved
handlers/metric-push-api/cfn.yaml Outdated Show resolved Hide resolved
handlers/metric-push-api/cfn.yaml Outdated Show resolved Hide resolved
steps: MetricPushRequest => Try[Unit]
): ApiGatewayRequest => ResponseModels.ApiResponse = req =>
(for {
wireUrlParams <- req.queryParamsAsCaseClass[WireUrlParams]()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking we could also pull out the headers and validate that the Referer header has an expected domain support.theguardian.com. The consequences of unexpected requests to this lambda seem fairly innocuous, especially if we limit the concurrent executions, but if the referrer check is simple then it wouldn't hurt.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not checking this in the API gateway only version. I'm sure it's possible to check based on my reading the docs a lot, but I am reluctant to try! 😴

@johnduffell
Copy link
Member Author

@jfsoul with these settings and no lambda backing the API gateway:
image
we can get these logs, which seem sufficient to understand any browser issue, and we get per method count alerting.
image
If we want to add additional information we could add it into the request query string which is logged above.
The only down side is this is logged as a separate line from the headers, so if we use that functionality it will be hard to correlate with the browser headers just be searching.

@jfsoul
Copy link
Contributor

jfsoul commented Jun 24, 2019

Sounds good to me @johnduffell - if it's simpler, I say ship it!

@johnduffell
Copy link
Member Author

ok, I (think) i've got the basic cloudformation working, I haven't deleted all the scala changes yet or fixed up the other comments, but I'm going to do some testing in CODE.

@guardian guardian deleted a comment Jun 24, 2019
@guardian guardian deleted a comment Jun 25, 2019
@@ -0,0 +1,21 @@
package com.gu.util.apigateway
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wrote two tests, one for the existing and one for my changes. But I thought even after deleting my changes it was worth keeping the other test. Does seem a bit out of place in this PR though.

@@ -1,6 +1,8 @@
import sbt._

object Dependencies {

val awsVersion = "1.11.574"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought it was worth keeping this AWS update since I've noticed there is a new one. Not all the effectsTests pass, because of changes to dev salesforce, but I the s3 stuff is working, the SQS effectsTest still passes, and the SES stuff doesn't seem to be used (any more?)

@guardian guardian deleted a comment Jun 25, 2019
Copy link
Contributor

@jfsoul jfsoul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work!

@johnduffell johnduffell merged commit ebc4425 into master Jun 26, 2019
@johnduffell johnduffell deleted the metric-push-api branch June 26, 2019 11:31
@johnduffell johnduffell changed the title WIP: add metric push api for alerting client side errors add metric push api for alerting client side errors Jun 26, 2019
@johnduffell
Copy link
Member Author

I just realised that we can't add a runscope test as we only have one endpoint/metric now. I think with the reduced complexity maybe this isn't so necessary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants