Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AWS Health Integration #8907

Closed
15 tasks done
SubhrataK opened this issue Jan 17, 2024 · 14 comments
Closed
15 tasks done

AWS Health Integration #8907

SubhrataK opened this issue Jan 17, 2024 · 14 comments
Assignees

Comments

@SubhrataK
Copy link

SubhrataK commented Jan 17, 2024

AWS Health assists in effectively managing ongoing events. It offers continuous insight into the performance of your resources and the availability of your AWS services and accounts. By leveraging AWS Health events, users obtain valuable insights into how service and resource modifications may impact their applications hosted on AWS.

The AWS Health integration with Elastic will retrieve the following information:

DescribeEvents operation - Summary information about events that are related to an AWS account. The events can be related to AWS operational issues, scheduled changes to AWS infrastructure, or security and billing notifications.
DescribeEventDetails operation - Detailed information about one or more events, such as the AWS service, Region, Availability Zone, event start and end times, and a text description.
DescribeAffectedEntities operation- Information about entities that are affected by one or more events. The results can be filtered by additional criteria, such as status, that might be assigned to AWS resources.

High Level Design

Criteria

  • Users of the metricbeat would be interested to know the upcoming , open events
  • Details of Closed events are ignored

Metric Fetch Mechanism

github.com/aws/aws-sdk-go-v2/service/health will be used to fetch the details of AWS Health

API Details

  • DescribeEvents
  • DescribeEventDetails
  • DescribeAffectedEntities
image

PR Link : elastic/beats#38370

Evaluation & Prototyping

  1. agithomas
  2. Team:Integrations
    agithomas

Test Scripts Development (Metricbeat Module)

Metricbeat module development

AWS Health Integration package Development

  1. New Integration Team:Service-Integrations
    agithomas

Release Process

@agithomas
Copy link
Contributor

Debug Logs Format :

{"log.level":"debug","@timestamp":"2024-03-19T08:05:06.712Z","log.logger":"aws.awshealth","log.origin":{"function":"github.com/elastic/beats/v7/x-pack/metricbeat/module/aws/awshealth.(*MetricSet).getEventsSummary","file.name":"awshealth/awshealth.go","file.line":196},"message":"[AWS Health] [DescribeEventDetails] Event ARN : arn:aws:health:us-east-1::event/RDS/AWS_RDS_PLANNED_LIFECYCLE_EVENT/AWS_RDS_PLANNED_LIFECYCLE_EVENT_XXXXXXXXXXXXXX, Affected Entities (Pending) : 2, Affected Entities (Resolved): 0, Affected Entities (Others) : 0","service.name":"metricbeat","ecs.version":"1.6.0"}
{"log.level":"debug","@timestamp":"2024-03-19T08:05:06.730Z","log.logger":"aws.awshealth","log.origin":{"function":"github.com/elastic/beats/v7/x-pack/metricbeat/module/aws/awshealth.(*MetricSet).getEventsSummary","file.name":"awshealth/awshealth.go","file.line":196},"message":"[AWS Health] [DescribeEventDetails] Event ARN : arn:aws:health:eu-west-3::event/RDS/AWS_RDS_PLANNED_LIFECYCLE_EVENT/AWS_RDS_PLANNED_LIFECYCLE_EVENT_YYYYYYYYYYYYY, Affected Entities (Pending) : 2, Affected Entities (Resolved): 0, Affected Entities (Others) : 0","service.name":"metricbeat","ecs.version":"1.6.0"}

@agithomas
Copy link
Contributor

As part of the testing, it is noticed that even if there are multiple affected entitles, not all entities ARNs have a associated status.

In such cases, there won't be any summary information displayed and there won't be any associated status information displayed in the detailed view.

image

So, the count of ARNs must not be interpreted as sum of aws.awshealth.affected_entities_others, aws.awshealth.affected_entities_pending, aws.awshealth.affected_entities_resolved for a specific event ARN.

@agithomas
Copy link
Contributor

When there is a status available against a resource ARN, AWS provides a view such as below, in contraction to the view mentioned here

image

@agithomas
Copy link
Contributor

Not all events has an associated end time. In such cases, the end_time will be stored as "end_time": "0001-01-01T00:00:00.000Z",

@agithomas
Copy link
Contributor

agithomas commented Mar 24, 2024

Dashboard layout

image

@agithomas
Copy link
Contributor

agithomas commented Mar 24, 2024

I tried to export a dashboard using metricbeat using the command - ./metricbeat export dashboard using the 8.9 stack. The json , attached, got created under /_meta/kibana/8/dashboard path. However, if I run the mage check I get the error

Error: there are format errors in dashboards .

Also, i see a message , mentioned below

Cannot modify all index pattern references in dashboard - module/aws/awshealth/_meta/kibana/8/dashboard/494194b0-e9d3-11ee-9f73-dfef113e2924.json
Please edit the dashboard override function named ReplaceIndexInDashboardObject in libbeat.

This error will be discussed with the ecosystem team to find if any mistake in command is made or if this is a known issue?

@agithomas
Copy link
Contributor

Requested for the team review of the Metricbeat PR. Once merged, the Integration development of the AWS Health package will resume.

@agithomas
Copy link
Contributor

Based on the feedback, the below changes are now attempted

  1. Make use of Paginators whenever possible
  2. Make use of aws pointer conversation APIs
  3. Avoid the usage of channels and pass EventARN as a batch. The exact batch size supported is to be determined.
  4. event.ID requirement
  5. Field name changes, especially related to count values
  6. Minor code refactoring

@agithomas
Copy link
Contributor

Based on the feedback, the below changes are now attempted

  1. Make use of Paginators whenever possible
    Completed
  2. Make use of aws pointer conversation APIs
    Completed
  3. Avoid the usage of channels and pass EventARN as a batch. The exact batch size supported is to be determined.
    Completed. Above batch size 10, below mentioned error will be displayed
operation error Health: DescribeEventDetails, https response error StatusCode: 400, RequestID: 293e4861-1bdb-4b1b-9a41-2f81ab826f23, api error ValidationException: 1 validation error detected:
.......
at 'eventArns' failed to satisfy constraint: Member must have length less than or equal to 10

  1. event.ID requirement
    Addressed
  2. Field name changes, especially related to count values
    Addressed the comment. No corrective steps made
  3. Minor code refactoring
    Done

@agithomas
Copy link
Contributor

Metricbeat changes are merged into the main branch.

@agithomas
Copy link
Contributor

Assigning the project status to "Waiting". The remaining enhancement will commence following the availability of AWS Health metric beat as part of the elastic-agent.

@agithomas
Copy link
Contributor

Resumed Integration Development.

System Test Result

image

@agithomas
Copy link
Contributor

image

@agithomas
Copy link
Contributor

PR merged

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants