New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add metric push api for alerting client side errors #331
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Are we sure we want this as a Scala lambda (given JVM warm-up time) - rather than say a Node/TypeScript one? (note that the aws-sdk is pretty nice)
- Did you avoid Sentry because it requires JavaScript? If not, how come? (
manage-frontend
sends some custom error messages in certain scenarios and it picks them up and alerts nicely)
thanks tom, if it's slow then it's not a problem as it won't be blocking anything on the page it's basicallyfire and forget. I did/am considering API gateway->queue and then have a lambda batching off the queue, but I (hope) it won't be high enough volume for that to be an issue. IF we're getting that many errors then we have bigger concerns than too many executions. I was originally planning sentry, but unfortunately sentry is per person rather than setting up "urgent 9-5" style team alerts. I think having consistency with the other alerts would be a plus point. Our sentry is not clean and I'm not sure someone has the motivation to be an advocate for it/keeping it clear and cared about. If someone will that's great but we just feel this will give more benefit for the effort with the current situation. |
On that note though it would be good if we can have a similar setup for TS lambdas as we have here for scala ones, so that it's easy(er) to pop one in and share libraries where necessary. Either that or cross compile all these ones to ScalaJs 😜 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This approach seems like the most straightforward way, and that is definitely a benefit. I've been trying to think if there's lower-footprint (but possibly over-smart) way like setting up a fastly service with logging that we only hit on errors and then alarm on some S3 metric of the bucket where we are logging 🙃
edit now I've thought more about alternatives:
- no-op API gateway endpoint and set alarm on the
Count
metric for that API - no-op lambda (behind gateway) and set alarm on the
Invocations
metric for that lambda
handlers/metric-push-api/cfn.yaml
Outdated
StageMap: | ||
CODE: | ||
ApiName: metric-push-api-api-CODE | ||
DomainName: metric-push-api-code.membership.guardianapis.com |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this the convention for these service lambdas? metric-push-api.code.dev-guardianapis.com
might be nicer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is because of the route 53 zone we are using for these, if there's a code.dev-guardianapis.com
one then maybe we can put it in there. Or we can add one. I'll have a look.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looking at https://console.aws.amazon.com/route53/home?region=eu-west-1#resource-record-sets:Z1E4V12LQGXFEC it does seem to be the convention. Whether it should be or not is another matter! it might even have been me that started the convention by mistake.
There is a *.support.guardianapis.com certificate and record set so I will at least change "membership" to "support".
To go further, it would have to be a separate cert (and record set) for *.code.dev-support.guardianapis.com, do you think it's worth doing that?
steps: MetricPushRequest => Try[Unit] | ||
): ApiGatewayRequest => ResponseModels.ApiResponse = req => | ||
(for { | ||
wireUrlParams <- req.queryParamsAsCaseClass[WireUrlParams]() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking we could also pull out the headers and validate that the Referer
header has an expected domain support.theguardian.com
. The consequences of unexpected requests to this lambda seem fairly innocuous, especially if we limit the concurrent executions, but if the referrer check is simple then it wouldn't hurt.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not checking this in the API gateway only version. I'm sure it's possible to check based on my reading the docs a lot, but I am reluctant to try! 😴
lib/handler/src/main/scala/com/gu/util/apigateway/ApiGatewayResponse.scala
Outdated
Show resolved
Hide resolved
@jfsoul with these settings and no lambda backing the API gateway: |
Sounds good to me @johnduffell - if it's simpler, I say ship it! |
ok, I (think) i've got the basic cloudformation working, I haven't deleted all the scala changes yet or fixed up the other comments, but I'm going to do some testing in CODE. |
@@ -0,0 +1,21 @@ | |||
package com.gu.util.apigateway |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wrote two tests, one for the existing and one for my changes. But I thought even after deleting my changes it was worth keeping the other test. Does seem a bit out of place in this PR though.
@@ -1,6 +1,8 @@ | |||
import sbt._ | |||
|
|||
object Dependencies { | |||
|
|||
val awsVersion = "1.11.574" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought it was worth keeping this AWS update since I've noticed there is a new one. Not all the effectsTests pass, because of changes to dev salesforce, but I the s3 stuff is working, the SQS effectsTest still passes, and the SES stuff doesn't seem to be used (any more?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work!
I just realised that we can't add a runscope test as we only have one endpoint/metric now. I think with the reduced complexity maybe this isn't so necessary. |
We have trouble at the moment that when things break on the client side, we don't know about it. We investigated using sentry but this was not possible. Now we have decided to have a fake img on the failure page that fetches this URL and we can cloudwatch alert on that.
Changes:
The most scary change is actually right at the bottom, I have updated the AWS lib version for everything!
TODO
DONE - deploy and test in CODE
DONE - check caching
DONE - check response content type
DONE - actually add alarm