Monitoring

There are several components to our EK centralized monitoring and alerting stack.

1. Cloudwatch application logs

2. Cloudtrail, X-Ray, APM, Promethues metrics

3. Log and metric aggregation to ElasticSearch

4. Kibana dashboards and visualizations

5. Cloudwatch alarms

Cloudwatch logs gather application logs and events. Cloudtrail, X-Ray, Prometheus are services to monitor application performance and capture metrics.

All logs are aggregated into Elasticsearch for ease of realtime search and access. Kibana is an elasticsearch feature enabling us to visualize and analize our application events so we can track load, troubleshoot and monitor our entire platform.

Cloudwatch alarms automate resource and latency alerts across our critical services.

Logs

Dashboards

Production API Dashboard

Development API Dashboard

Production Lambda Dashboard

Development Lambda Dashboard

X-RAY Dashboard

All Events

Production API Access Logs

Production API Request Logs

Production EKS Logs

Development EKS Logs

Alarms

Cloudwatch alarms are automatically generated and triggered by Deployment Events. Alarms are pushed to monitoring-dev@system.com and monitoring-prod@system.com as well Slack #alerts channel.

API Gateway	EC2	ElasticSearch	SNS /SQS
5XX / 4XX	StatusCheckFailed	ClusterStatus.Red	NumberOfNotificationsFailed
Latency	Memory/CPU	ClusterIndexWritesBlocked	Deadletter
Lambda Error

Contributing

Project Structure

This project is built in node / typescript and runs on AWS serverless architecture. The three applications associated with each environment are:

Custom Application:

1. system-monitoring

Application includes custom functions to tag new and existing log groups by environment, push log events to ES and autocreate Cloudwatch alarms by resource, triggered by Cloudtrail events.

Third Party Application:

2. system-monitoring-SubscribeCloudWatchApplication

Autosubscribes new and existing Cloudwatch Log groups to a custom Lambda function

3. system-monitoring-SubscribeCloudWatchApplication-1-LambdaInvocationCustomResource

Lambda Functions

There are several custom Lamda functions that rely heavily on AWS Javascript SDK for the following Classes and Methods: AWS.CloudWatch, AWS.CloudWatchLogs, AWS.APIGateway, AWS.Lambda, AWS.SNS, AWS.SQS

NOTE: All Lambda Alarm functions filter for resources with -dev and -prod substrings within the name. Any new applications or resources using this naming convention will be automatically included.

Adding Alarms

Adding a new alarm is fairly straightforward once we have defined:

What AWS events will trigger the creation of this alarm? Consider existing resources as well as new resources.
What Resource group and metric will this alarm monitor?

Available Resource Metrics: AWS/ApiGateway, AWS/Lambda, AWS/EC2, AWS/ES, AWS/SNS

Example AWS/EC2 Alarm:

  AlarmName: `EC2 for Autoscale group[${autoscaleGroup}] :  Status Check Failed for over 1 min`,
  MetricName: "StatusCheckFailed",
  Dimensions: [
    { Name: 'AutoScalingGroupName', Value: autoscaleGroup }
  ],
  Namespace: 'AWS/ES',
  ComparisonOperator: 'GreaterThanOrEqualToThreshold',
  Period: 300,
  Threshold: 1,
  EvaluationPeriods: 2,
  DatapointsToAlarm: 2, // 1  mins to trigger alarm
  Statistic: 'Maximum',
  ActionsEnabled: true,
  AlarmActions: alarmActions,
  AlarmDescription: `auto-generated by Lambda [${process.env.AWS_LAMBDA_FUNCTION_NAME}]`,
  OKActions: okAction,
  TreatMissingData: "notBreaching",
  Unit: 'Milliseconds'

Adding an alarm series to the main alarm function

  // Run these asynchronously 
  const lambdaPromise = lambdaAlarms.createAlarms()
  const esPromise = esAlarms.createAlarms()
  const snsPromise = snsAlarms.createAlarms()
  const queuePromise = queueAlarms.createAlarms()

  await lambdaPromise
  await esPromise
  await snsPromise
  await queuePromise

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.circleci		.circleci
functions		functions
lib		lib
README.md		README.md
package-lock.json		package-lock.json
package.json		package.json
serverless.yml		serverless.yml
tsconfig.json		tsconfig.json
webpack.config.js		webpack.config.js

Folders and files

Latest commit

History

Repository files navigation

Monitoring

1. Cloudwatch application logs

2. Cloudtrail, X-Ray, APM, Promethues metrics

3. Log and metric aggregation to ElasticSearch

4. Kibana dashboards and visualizations

5. Cloudwatch alarms

Logs

Dashboards

All Events

Alarms

Contributing

Project Structure

1. system-monitoring

2. system-monitoring-SubscribeCloudWatchApplication

3. system-monitoring-SubscribeCloudWatchApplication-1-LambdaInvocationCustomResource

Lambda Functions

Adding Alarms

Example AWS/EC2 Alarm:

Adding an alarm series to the main alarm function

About

Resources

Uh oh!

Stars

Watchers

Forks

Uh oh!

Uh oh!

Uh oh!

Languages