Skip to content

moniecodes/serverless_typescript

Repository files navigation

Monitoring

There are several components to our EK centralized monitoring and alerting stack.

1. Cloudwatch application logs
2. Cloudtrail, X-Ray, APM, Promethues metrics
3. Log and metric aggregation to ElasticSearch
4. Kibana dashboards and visualizations
5. Cloudwatch alarms

Cloudwatch logs gather application logs and events. Cloudtrail, X-Ray, Prometheus are services to monitor application performance and capture metrics.

All logs are aggregated into Elasticsearch for ease of realtime search and access. Kibana is an elasticsearch feature enabling us to visualize and analize our application events so we can track load, troubleshoot and monitor our entire platform.

Cloudwatch alarms automate resource and latency alerts across our critical services.

Logs

Dashboards

All Events

Alarms

Cloudwatch alarms are automatically generated and triggered by Deployment Events. Alarms are pushed to monitoring-dev@system.com and monitoring-prod@system.com as well Slack #alerts channel.

API Gateway EC2 ElasticSearch SNS /SQS
5XX / 4XX StatusCheckFailed ClusterStatus.Red NumberOfNotificationsFailed
Latency Memory/CPU ClusterIndexWritesBlocked Deadletter
Lambda Error

Contributing

Project Structure

This project is built in node / typescript and runs on AWS serverless architecture. The three applications associated with each environment are:

Custom Application:

1. system-monitoring

Application includes custom functions to tag new and existing log groups by environment, push log events to ES and autocreate Cloudwatch alarms by resource, triggered by Cloudtrail events.

Third Party Application:

2. system-monitoring-SubscribeCloudWatchApplication

Autosubscribes new and existing Cloudwatch Log groups to a custom Lambda function

3. system-monitoring-SubscribeCloudWatchApplication-1-LambdaInvocationCustomResource

Lambda Functions

There are several custom Lamda functions that rely heavily on AWS Javascript SDK for the following Classes and Methods: AWS.CloudWatch, AWS.CloudWatchLogs, AWS.APIGateway, AWS.Lambda, AWS.SNS, AWS.SQS

NOTE: All Lambda Alarm functions filter for resources with -dev and -prod substrings within the name. Any new applications or resources using this naming convention will be automatically included.

Creates and updates API Gateway endpoint alarms, triggered via aws.apigateway:CreateDeployment event

Creates and updates Lambda/ES/SNS/SQS alarms, triggered via aws.apigateway:CreateDeployment event

Creates and updates Autocalse/EC2 alarms, triggered via aws.autoscaling:UpdateAutoScalingGroup event

Sends subscribed cloudwatch log data to ElasticSearch

Classifies and tags existing log groups env=dev or env=prod

Classifies and tags new log groups env=dev or env=prod

Adding Alarms

Adding a new alarm is fairly straightforward once we have defined:

  1. What AWS events will trigger the creation of this alarm? Consider existing resources as well as new resources.
  2. What Resource group and metric will this alarm monitor?

Available Resource Metrics: AWS/ApiGateway, AWS/Lambda, AWS/EC2, AWS/ES, AWS/SNS

Example AWS/EC2 Alarm:

  AlarmName: `EC2 for Autoscale group[${autoscaleGroup}] :  Status Check Failed for over 1 min`,
  MetricName: "StatusCheckFailed",
  Dimensions: [
    { Name: 'AutoScalingGroupName', Value: autoscaleGroup }
  ],
  Namespace: 'AWS/ES',
  ComparisonOperator: 'GreaterThanOrEqualToThreshold',
  Period: 300,
  Threshold: 1,
  EvaluationPeriods: 2,
  DatapointsToAlarm: 2, // 1  mins to trigger alarm
  Statistic: 'Maximum',
  ActionsEnabled: true,
  AlarmActions: alarmActions,
  AlarmDescription: `auto-generated by Lambda [${process.env.AWS_LAMBDA_FUNCTION_NAME}]`,
  OKActions: okAction,
  TreatMissingData: "notBreaching",
  Unit: 'Milliseconds'

Adding an alarm series to the main alarm function

  // Run these asynchronously 
  const lambdaPromise = lambdaAlarms.createAlarms()
  const esPromise = esAlarms.createAlarms()
  const snsPromise = snsAlarms.createAlarms()
  const queuePromise = queueAlarms.createAlarms()

  await lambdaPromise
  await esPromise
  await snsPromise
  await queuePromise

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors