Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not merge: SLO solution (Cloudwatch embedded metric log format) proof of concept #696

Closed
wants to merge 2 commits into from

Conversation

michaelwmcnamara
Copy link
Contributor

@michaelwmcnamara michaelwmcnamara commented Jul 29, 2022

What does this change?

This is a proof of concept PR to demonstate the validity of using the AWS embedded metric format to capture metrics directly from the log processing without incurring the cost or potential blocking of a cloudwatch.putMetric() command.
This is a solution to a problem tracking historical SLI's for the mobile team. The root of the problem is that, as the metrics are pulled from the logs by a kibana or grafana dashboard, there is currently no way to persist the vaues beyond the lifetime of those log files.

The solution we propose is to log these values as cloudwatch metrics. These persist more or less permanently
and can then be accesed by dashboarding software - eg grafana - and displayed at will.

Normally metrics are sent to cloudwatch using a putMetric() request. However this is both a blocking call - so adds risk and slows things down - and has a specific cost of $US0.01 per 1000 requests.
AWS embedded metric format avoids both of these. The processing cost falls on AWS as it is handled as part of their log processing procedures and as we are not using a putMetric() request, we avoid the above mentioned cost.

This code contains a working example of a metric being sent as part of a log message. We have tested in code and confirmed that this metric is indeed created.
Full details on the format needed to take advantage of this is here

Detailed documentation on how to use Embedded metrics and their advantages and disadvantages can be found here

In addition an excellent study of using these metrics is in the first 10 minutes of this video

@michaelwmcnamara michaelwmcnamara changed the title Do not merge: Embedded metric format proof of concept Do not merge: SLO solution (Cloudwatch embedded metric log format) proof of concept Jul 29, 2022
@michaelwmcnamara michaelwmcnamara requested review from DavidLawes and waisingyiu and removed request for DavidLawes July 29, 2022 17:08
@michaelwmcnamara
Copy link
Contributor Author

Adding approvers for visibility and feedback - no merge requested in its current state.

@michaelwmcnamara
Copy link
Contributor Author

This POC adds this logging for the "harvester.notificationProcessingTime". Other metrics could be added in the same structure as the _aws takes an array of metrics for processing.
We would need to repeat this for the other lambdas in the architecture so the real implementation would best be served by abstracting this thing into a common function. For now we just wanted to show that it works.

@michaelwmcnamara
Copy link
Contributor Author

Closing this as it is just for demonstration purposes

@DavidLawes
Copy link
Contributor

Hi @michaelwmcnamara - I can't remember whether we fed back to you on this or not, but we took your proof of concept and implemented cloudwatch embedded metrics in our lambdas. It's been super useful and easy to make work. Thanks so much for sharing the concept with us!!

@DavidLawes DavidLawes deleted the mm-jw-embedded-metric-format-poc branch October 21, 2022 09:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants